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PI +LGLS+F+ +L+ L +K +V++ K + F L+W A+ E RKQSILKF 

Sbjct: 121 PIFRRLGLSLFIFIILVLILLALKRVVLSRKTRYFLRGNRLDWAKAVAFESNRKQSILKF 180 

Query: 181 FSLFTNVKGISTSVKRRSFLDGILKLISKTPSRLWTNLFVR&FLRSSDYLGLTIRLVTLN 240 
5 +SLFT VKGIST VK R++L+ +LKL+ +TPS LW +L+ RAFLRSSDYLGL +RL+ L+ 

Sbjct: 181 YSLFTTVKGISTKVKERTYLNPLLKLVKQTPSNLWLSLYARAFLRSSDYLGLFLRLMLLS 240 

Query: 241 ILSVIFVNETYLALALAFVFNYLLLFQLLALGHHFDYQYMNQLYPVRLNAKASQLKGFLR 300 
LSV F++ YL+++LA +FNYL++FQLL+L +H+DY YM LYP +K + FLR 
10 Sbjct: 241 SLSVFFIHNLYLSVSIALIFNYLWFQLLSLYYHYDYHYMTSLYPENSRSKKKNMLSFLR 300 

Query: 301 VLSYAVTVIDSILIRELKPVILLIVLMLIVTEYYIPYKIKKMID 344 

LS+ + +++ + ++LIV M+ + Y+PYK+KK+ ID 

Sbjct: 301 GLSFLMLIVNMLCCSSAPKALILIVGMVFIACIYLPYKLKKIID 344 

15 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1120 

A DNA sequence (GBSxll95) was identified in S.agalactiae <SEQ ID 3463> which encodes the amino 
20 acid sequence <SEQ ID 3464>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

>>> Seems to have no N-terminal signal sequence 

— : -- Final Results 

25 bacterial cytoplasm Certainty=0. 2 821 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

30 >GP:AAC00284 GB:AF008220 YtmP [Bacillus subtilis] 

Identities = 69/214 (32%) , Positives = 121/214 (56%) , Gaps = 1/214 (0%) 

Query: 12 PLRGKSGKAYI GTYPNGER VFVKYNTTP I L PALAKEQ I APQLLWARRTSNGDMMSAQEWL 71 
P G +G AY + NG+++F+K N++P L L+ E I P+L+W +R NGD+++AQ W+ 
35 Sbjct: 20 PAGGATGDAYYAKH-NGQQLFLKl^SSPFI^VLSAEGIVPKLvWTKRMENGDVITAQHWM 78 

Query: 72 DGRTLTKEDMGSKQIIHILLRLHKSRPLVNQLLQLGYKIENPYDLLMDWEKQTPIQIREN 131 

GR L +DM + + +L ++H S+ L++ L +LG + NP LL ++ + + 

Sbjct: 79 TGRELKPKDMSGRPVAELLRKIHTSKALLDMLKRLGKEPLNPGALLSQLKQAVFAVQQSS 138 

40 

Query: 132 TYLQSIVTELKRSLPEFRTEVATIVHGDIKHSNWVITTSGLIYLVDWDSVRLTDRMYDVA 191 

+Q + L+ L E + H D+ H+NW+ + + +YL+DWD + D D+ 

Sbjct: 139 PLIQEGIKYLEEHLHEVHFGEKVVCHCDVNHNNWLLSEDNQLYLIDWDGAMIADPAMDLG 198 

45 Query: 192 YI LSHYI PQKHWKDWLSYYGYKDNEKVWS KI IWY 225 

+L HY+ + W+ WLS YG + E + ++ WY 
Sbjct: 199 PLLYHYVEKPAWESWLSMYGIELTESLRLRMAWY 232 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3465> which encodes the amino acid 
50 sequence <SEQ ID 3466>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

>» Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0. 2686 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 214/262 (81%) , Positives = 242/262 (91%) 



Query: 


1 


MTISNQELTLTPLRGKSGKAYIGTYPNGERVFVKYNTTPILPALAKEQIAPQLLWARRTS 


60 






+T + QELTLTPLRGKSGKAY GTYPNGE VF+K NTTPILPALAKEQIAPQLLWA+R 




Sb j ct : 


1 


VTTTEQELTLTPLRGKSGKAYKGTYPNGECVFIKIiNTTPILPAIAKEQIAPQLLWAKRMG 


60 


Query: 


61 


NGDMMSAQEWLDGRTLTKEDMGSKQIIHILLRLHKSRPLVNQLLQLGYKIENPYDLLMDW 


120 






NGDMMSAQEWL+GRTLTKEDM SKQIIHILLRLHKS+ LVNQLLQL YKIENPYDLL+D+ 




Sbjct: 


61 


NGDIWSAQEWIiNGRTLTKEDr^SKQIIHILLRLHKSKKIiWQLLQIiNYKIENPYDLIiVDF 


120 


Query: 


121 


EKQTPIQIRENTYLQSIVTELKRSLPEFRTEVATIVHGDIKHSNWVITTSGLIYLVDWDS 


180 






E+ P+QI++N+YLQ+IV ELKRSLPEF++EVATIVHGDIKHSNWVITTSG+I+LVDWDS 




Sbjct: 


121 


EQNAPLQIQQNSYLQAIVKELKRSLPEFKSEvATIVHGDIKHSNWVITTSGMIFrjVDWDS 


180 


Query: 


181 


VRLTDRMYDVAYILSHYIPQKHWKDWLSYYGYKDNEKVWSKIIWYGQFSYLSQIIKCFDK 


240 






VRLTDRMYDVAY+LSHYIP+ W +WLSYYGYK+N+KV KIIWYGQFS+L+QI+KCFDK 




Sb j ct : 


181 


VRLTDRMYDVAYLLSHYI PRSRWSEWLSYYGYKNNDKVMQKI IWYGQFSHLTQILKCFDK 


240 


Query: 


241 


RDMEHVNQEIYELRKFRELIKK 262 








RDMEHVNQEIY LRKFRE+ +K 




Sb j ct : 


241 


RDMEHVNQE I YALRKFRE I FRK 262 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1121 

A DNA sequence (GBSxll96) was identified in S.agalactiae <SEQ ID 3467> which encodes the amino 
acid sequence <SEQ ID 3468>. Analysis of this protein sequence reveals the following: 
Possible site: 51 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 4529 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC00285 GB:AF008220 YtmQ [Bacillus subtilis] 
Identities = 126/211 (59%) , Positives = 161/211 (75%) 



Query: 


1 


MRVRKRKGAEEHLENNPHYVISNPEEAKGRWHEIFGNNNPIHIEVGSGKGAFITGMAEQN 


60 






MR+R + A++ L N ISNP + KG+W+ +FGN+NPIHIEVG+GKG FI+GMA+QN 




Sb j ct : 


1 


MRMRHKPWADDFIAENADIAISNPADYKGKHNTVFGNDNPIHIEVGTGKGQFISGMAKQN 


60 


Query: 


61 


PDINYIGIDIQLSVLSYALDKVLDSGAKNIKLLLVDGSSLSNYFDTGEVDLMYraFSDPW 


120 






PDINYIGI++ SV+ A+ KV DS A+N+KLL +D +L++ F+ GEV +YLNFSDPW 




Sb j ct : 


61 


PDINYIGIELFKSVIVTAVQKVKDSEAQOTKLIjNIDADTLTDVFEPGEVKRVYLNFSDPW 


120 


Query: 


121 


PKKKHEKRRLTYKTFLDTYKDILPEQGEIHFKTDNRGLFEYSLASFSQYGMTLKQVWLDL 


180 






PKK+HEKRRLTY FL Y++++ + G IHFKTDNRGLFEYSL SFS+YG+ L V LDL 




Sb j ct : 


121 


PKKRHEKRRLTYSHFLKKYEEVMGKGGSIHFKTDNRGLFEYSLKSFSEYGLLLTYVSLDL 


180 


Query: 


181 


HASDYQQNIMTEYERKFSNKGQVIYRVEARF 211 








H S+ + NIMTEYE KFS GQ IYR E + 




Sbj ct : 


181 


HNSNLEGNIMTEYEEKFSALGQPIYRAEVEW 211 





A related DNA sequence was identified in S.pyogenes <SEQ ID 3469> which encodes the amino acid 
sequence <SEQ ID 3470>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

»> Seems to have no N-terminal signal sequence 
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Final Results 



bacterial cytoplasm 
bacterial membrane 
bacterial outside 



Certainty=0. 33 03 (Affirmative) < suco 
Certainty=0.0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 179/211 (84%) , Positives = 193/211 (90%) 



Query: 


1 


MRTOKRKGAEEHLENNPHWISNPEFJUCGRWHEIFGNNNPIHIEVGSGKGAFITGMAEQN 


60 






MRVRKRKGAEEHL NNPHYVI NPE+AKGRWH++FGN+ PIHIEVGSGKG FITGMA +N 




Sb j ct : 


1 


MRTOKRKGAEEHLANNPHYVILNPEDAKGRWHDVFGNDRPIHIEVGSGKGGFITGMMj 


60 


Query: 


61 


PDINYIGIDIQLSVLSYALDKVLDSGAKNIKLLLVDGSSLSNYFDTGEVDLMYLiNFSDPW 


120 






PDINYIGIDIQLSVLSYALDKVL S N+KLL VDGSSL+NYF+ GEVD+MYLNFSDPW 




Sb j ct : 


61 


PDINYIGIDIQLSVLSYALDKOTiASEVPNVKLIiRVDGSSLTl^FEDGEVDMMYIiNFSDPW 


120 


Query: 


121 


PKKKHEKRRLTYKTFLDTYKDILPEQGEIHFKTDNRGLFEYSLASFSQYGMTLKQVWLDL 


180 






PK KHEKRRLTYK FLDTYK ILPE GEIHFKTDNRGLFEYSLASFSQYGMTL+Q+WLDL 




Sb j ct : 


121 


PKTKHEKRRLTYKDFLDTYKRILPEHGEIHFKTDNRGLFEYSLASFSQYGMTLRQIWLDL 


180 


Query: 


181 


HASDYQQNIMTEYERKFSNKGQVIYRVEARF 211 








HAS+Y+ N+MTEYE KFSNKGQVIYRVEA F 




Sbjct: 


181 


HASNYEGNVMTEYEEKFSNKGQVIYRVEANF 211 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1122 

A DNA sequence (GBSxll97) was identified in S.agalactiae <SEQ ID 347 1> which encodes the amino 
acid sequence <SEQ ID 3472>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>>> Seems to have no N-terminal signal sequence 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06136 GB:AP001515 unknown conserved protein [Bacillus halodurans] 
Identities = 61/124 (49%) , Positives = 81/124 (65%) , Gaps = 2/124 (1%) 

Query: 2 GGDYVLSILIDKPGGITVEDTAQLTDWSPLLDTIQPDPFPEQYMLEVSSPGLERPLKTA 61 

G D+ L + ID G+ +ED ++++ +S LD + DP + Y LEVSSPG ERPLK 
Sbjct: 33 GKDWFLRVFIDSETGVDLEDCGKVSERLSEKLD--ETDPIEQAYFLEVSSPGAERPLKRE 90 

Query: 62 EALSNAVGSYINVSLYKSIDKVKIFEGDLLSFDGETLTIDYMDKTRHKTVDIPYQTVAKA 121 

+ L ++G ++V+LY+ ID K EG+L FDGETLTI+ KTR KTV IPY VA A 
Sbjct: 91 KDLLRS IGKNVHVTLYEP IDGEKALEGELTE FDGETLTIEIKI KTRKKTVTI PYAKVASA 150 

Query: 122 RLAV 125 
RLAV 

Sbjct: 151 RLAV 154 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3473> which encodes the amino acid 
sequence <SEQ ID 3474>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

>>> Seems to have no N-terminal signal sequence 



Final Results 



bacterial cytoplasm Certainty=0 . 1311 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Final Results 
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bacterial cytoplasm Certainty=0. 3445 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 An alignment of the GAS and GBS proteins is shown below. 

Identities = 101/127 (79%) , Positives = 117/127 (91%) 

Query: 1 MGGDYVLSILIDKPGGITVEDTAQLTDWSPLLDTIQPDPFPEQYMLEVSSPGLERPLKT 60 
MG DY+LSIL+DK GGITVEDT+ LT+++SPLLDTI PDPFP QYMLEVSSPGLERPLKT 
10 Sbjct: 52 MGSDYILSILVDKEGGITVEDTSDLTNIISPLLDTIDPDPFPNQYMLEVSSPGLERPLKT 111 

Query: 61 AEALSNAVGSYINVSLYKSIDKVKIFEGDLLSFDGETLTIDYMDKTRHKTVDIPYQTVAK 120 

A++L AVGSYINVSLY++IDKVK+F+GDLL+FDGETLTIDY+DKTRHK V+IPYQ VAK 
Sbjct: 112 ADSLKAAVGSYINVSLYQAIDKVKVFQGDLLAFDGETLTIDYLDKTRHKIVNIPYQAVAK 171 

15 

Query: 121 ARLAVKL 127 

R+AVKL 
Sbjct: 172 VRMAVKL 178 

20 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1123 

A DNA sequence (GBSxll98) was identified in S.agalactiae <SEQ ID 3475> which encodes the amino 
acid sequence <SEQ ID 3476>. This protein is predicted to be n utilization substance protein a homolog 
25 (nusA). Analysis of this protein sequence reveals the following: 

Possible site: 36 

>>> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0 . 5069 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9565> which encodes amino acid sequence <SEQ ID 9566> 
35 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13533 GB:Z99112 nusA [Bacillus subtilis] 
Identities = 164/370 (44%) , Positives = 251/370 (67%) , Gaps = 15/370 (4%) 

40 Query: 4 MSKEMLEAFRILEEEKHINKEDIIDAVTESLKSAYKRRYGQSESCVIEFNEKKADFTVYT 63 

MS E+L+A ILE+EK I+KE II+A+ +L SAYKR + Q+++ ++ N + V+ 
Sbjct: 1 MSSELLDALTILEKEKGISKEIIIEAIEAALISAYKRNFNQAQNVRVDIJSIRETGSIRVFA 60 

Query: 64 WEVVDEVFDSRLEISLKDALAISSAYELGDKIRFEESVTEFGRVAAQSAKQTIMEKMRR 123 
45 ++WDEV+D RLEIS+++A I Y +GD + E + + FGR+AAQ+AKQ + +++R 

Sbjct: 61 RKDWDEVYDQRLEISIEEAQGIHPEYMVGDVVEIEVTPKDFGRIAAQTAKQVVTQRVRE 120 

Query: 124 QMREVTFNEYKQHEGEIMTGTVERFDQRFIYVNLGSLEAQLSHQDQIPGESFKSHDMIDV 183 
R V ++E+ E +IMTG V+R D +FIYV+LG +EA L +Q+P ES+K HD I V 
50 Sbjct: 121 AERGVIYSEFIDREEDIMTGIVQRLDNKFIYVSLGKIEALIiPVNEQMPNESYKPHDRIKV 180 

Query: 184 YVYKVENNPKGVNVFVSRSHPEFIKRIMEREIPEVFDGTVEIMSVSREAGDRTKVAVRSH 243 

Y+ KVE KG ++VSR+HP +KR+ E E+PE++DGTVE+ SV+REAGDR+K++VR+ 
Sbjct: 181 YITKVEKTTKGPQIWSRTHPGLLKRLFEIEOTEIYDGTVELKSVAREAGDRSKISVRTD 240 



55 



Query: 244 NSNvDAIGTIVGRGGSNIKKVISNFHPKRVDAKTGLEIPVEENIDVIQWVEDPAEFIYNA 303 

+ +VD +G+ VG G ++ +++ E ID++ W DP EF+ NA 

Sbjct: 241 DPDVDPVGSCVGPKGQRVQAIVNELK GEKIDIVNWSSDPVEFVANA 286 
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Query: 304 IAPAEVDMVLFDDEDTKRATVWPDSKLSLAIGRRGQNVRLAAHLTGYRIDIKSASEYEK 363 

++P++V V+ ++E+ K TV+VPD +LSLAIG+RGQN RLAA LTG++IDIKS ++ + 
Sbjct: 287 LSPSKA/LDVIVNEEE-KATWIVPDYQLSIAIGKRGQNftRIAAKLTGWKIDIKSETDARE 345 

Query: 364 MEAQELQTEE 373 

+ + EE 

Sbjct: 346 LGIYPRELEE 355 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3477> which encodes the amino acid 
sequence <SEQ ID 3478>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2074 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 348/380 (91%) , Positives = 361/380 (94%) , Gaps = 2/380 (0%) 



Query: 


4 


MSKEMLEAFRILEEEKHINKEDIIDAVTESLKSAYKRRYGQSESCTIEFNEKKADFTVYT 


63 






MSKEMLEAFRILEEEKHI+K DIIDAVTESLKSAYKRRYGQSESCVIEFNEK ADF V+T 




Sbjct: 


12 


MSKEMLEAFRILEEEKHIDKADIIDAVTESLKSAYKRRYGQSESCVIEFNEKTADFQVFT 


71 


Query: 


64 


VREWDEVFDSRLEISLKDALAISSAYELGDKIRFEESVTEFGRVAAQSAKQTIMEKMRR 


123 






VREW+EVFDSRLEISLKDALAISSAYELGDKIRFEESV EFGRVAAQSAKQTIMEKMRR 




Sbjct: 


72 


TOEvvEEVFDSRLEISLKDALAISSAYELGDKIRFEESVNEFGRVBAQSAEQTIMEKMRR 


131 


Query: 


124 


QMREVTFNEYKQHEGEIMTGTVERFDQRFIYVNLGSLEAQLSHQDQIPGESFKSHDMIDV 


183 






QMREV FNEYK+HEGEIMTGTVERFDQRFIYVNLGSLEAQLSHQDQIPGE+FKSHD IDV 




Sb j ct : 


132 


QMREVMFNEYKEHEGEIMTGTV3RFDQRFIYVNLGSLEAQLSHQDQIPGETFKSHDRIDV 


191 


Query: 


184 


YVYKVENNPKGVNVWSRSHPEFIKRIMEREIPEVFDGTVEIMSVSREAGDRTKVAVRSH 


243 






YVYKVENNPKGVNVFVSRSHPEFIKRIME+EIPEVFDGTVEIMSVSREAGDRTKVAVRSH 




Sb j ct : 


192 


YVYKVENNPKGVNVFVSRSHPEFIKRIMEQEIPEVFDGTVEIMSVSREAGDRTKVAVRSH 


251 


Query: 


244 


NSNVDAIGTIVGRGGSNIKKVISNFHPKRVDAKTGLEIPVEENIDVIQWVEDPAEFIYNA 


303 






N NVDAIGTTVGRGGSNIKKVIS FHPKRVDAKTGLEIPVEENIDVIQWV+DPAEFIYNA 




Sb j ct : 


252 


NPNVDAIGTIVGRGGSNIKKVISKFHPKRVDAKTGLEIPVEENIDVIQWVDDPAEFIYNA 


311 


Query: 


304 


IAPAEvD^WLFDDEDTKFATvWPDSKLSLAIGRRGQNVRLAAHLTGYRIDIKSASEYEK 


363 






IAPAEVDMVLFDDED KRATVWPDSKLSLAIGRRGQNVRLAAHLTGYRIDIKSASEY++ 




Sbjct: 


312 


IAPAEVDMVLFDDEDLKRATVWPDS KLS LAI GRRGQNVRLAAHLTGYRI DI KSASEYDR 


371 


Query: 


364 


MEAQELQTEEVAQESEVISD 383 








+EA+ + A E V+ D 




Sbjct: 


372 


LEAE- -KEAATAVEEPWDD 389 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1124 

A DNA sequence (GBSxll99) was identified in S.agalactiae <SEQ ID 3479> which encodes the amino 
acid sequence <SEQ ID 3480>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2012 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

5 >GP:CAB13534 GB:Z99U2 alternate gene name: ymxB-simllar to 

hypothetical proteins [Bacillus subtilis] 
Identities = 46/92 (50%) , Positives = 67/92 (72%) , Gaps = 1/92 (1%) 

Query: 1 mKTKKIPLRKSWSGEVIDKRDLLRIWNKEGQVFIDPTGKQNGRGAYIKLDNDEAILA 60 
10 M K KKIPLRK W+GE+ K++L+R+V++KEG++ +DPTGK+NGRGAY+ LD + + A 

Sbjct: 1 MNKHKKIPLRKCVVTGEMKPKKELIRVVRSKEGEISVDPTGKKNGRGAYLTLDKECILAA. 60 

Query: 61 KKKRVFDRSFSMEVSDEFYDELIAYVDHKVKR 92 
KKK F ++ D+ +DELL + KVK+ 

15 Sbjct: 61 KKKNTLQNQFQSQIDDQI FDELLEIAE- KVKK 91 

A related DNA sequence was identified in S. pyogenes <SEQ ID 348 1> which encodes the amino acid 
sequence <SEQ ID 3482>. Analysis of this protein sequence reveals the following: 

Possible site: 61 
20 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1008 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 77/98 (78%) , Positives = 92/98 (93%) 

30 Query: 1 MAKTKKIPLRKSVVSGEVIDKRDLLRIVKNKEGQVFIDPTGEQNGRGAYIKIjDNDEAILA 60 

M+K KKIPLRKS+VSGE+1 KRDLLRIVK K+GQVFIDPTGKQNGRGAYI KLEIN EA++A 
Sbjct: 2 MSKVKKIPLRKSLVSGEIIAKRDLLRIVKIKDGQvFIDPTGKQNGRGAYIKLDNQEALMA 61 

Query: 61 KKKRVFDRSFSMEVSDEFYDELLAYVDHKVKRRELGLE 98 
35 KKK+VF+RSFSM++ + FYD+L+AYVDHK+KRRELGL+ 

Sbjct: 62 KKKQVFNRSFSMDIPESFYDDLIAYVDHKIKRRELGLD 99 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

40 Example 1125 

A DNA sequence (GBSxl200) was identified in S.agalactiae <SEQ ID 3483> which encodes the amino 
acid sequence <SEQ ID 3484>. This protein is predicted to be probable ribosomal protein in infb 5'region. 
Analysis of this protein sequence reveals the following: 

Possible site: 19 
45 »> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



55 



>GP:BAB06133 GB:AP001515 unknown conserved protein [Bacillus halodurans] 
Identities = 46/95 (48%) , Positives = 65/95 (68%) , Gaps = 1/95 (1%) 

Query: 6 KVI^IGIAQRAGRLITGEELVIKAIQNQQVSLIFLaNDAGPNLTKKVTDKSNYYKTEVS 65 
K L+L+GLA Rft. +L+TGEE V+KA+QN QV+L+ L++DAG + KK+ DK Y+ V 
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Sbjct: 5 KWLSLLGIAARARQLLTGEEQVVKAVQNGQVTLVILSSDAGIHTKKKLLDKCGSYQIPVK 64 

Query: 66 WFSTLELSDALGK-PRKWAVADAGFSKKMRTLM 99 
V + L A+GK R V+ V DAGFS+K+ L+ 
5 Sbjct: 65 WGNRQMLGRAIGKHERWIGVKDAGFSRKLAALI 99 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3485> which encodes the amino acid 
sequence <SEQ ID 3486>. Analysis of this protein sequence reveals the following: 

Possible site: 45 
10 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1950 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 75/99 (75%) , Positives = 88/99 (88%) 

20 Query: 1 MNNSEKVIiNLIGIAQRAGRLITGEELVIKAIQNQQVSLIFLANDAGPNLTKKVrDKSNYY 60 

+ N E++ +LIG AQRAG++I+GEELV+KAIQ+QQV L+FLANDAGPN+TKKVTDKSNYY 
Sbjct: 1 LTNLERLSSLIGPAQRAGKVISGEELVVKAIQHQQVILVFIANDAGPNVTKKVTDKSNYY 60 

Query: 61 KTEVSTVFSTLELSDALGKPRKWAVADAGFSKKMRTLM 99 
25 EVSTV + LELS ALGKPRKV A+ADAGFSKKMRTLM 

Sbjct: 61 NVEVSTVLNALELSAALGKPRKVAAIADAGFSKKMRTLM 99 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

30 Example 1126 

A DNA sequence (GBSxl201) was identified in S.agalactiae <SEQ ID 3487> which encodes the amino 
acid sequence <SEQ ID 3488>. Analysis of this protein sequence reveals the following: 



35 



40 



Possible site: 37 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2873 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10959> which encodes amino acid sequence <SEQ ID 
10960> was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3489> which encodes the amino acid 
sequence <SEQ ID 3490>. Analysis of this protein sequence reveals the following: 

45 Possible site: 37 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2985 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 735/961 (76%) , Positives = 805/961 (83%) , Gaps = 42/961 (4%) 
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Query: 1 MSKKRLHEIAKEIGKTSKEWEQAQSLGLPVKSHASSVEENDATRIVESFS-SSKTKAPT 59 

+SKKRLHEIAKEIGK+SKEWE A+ LGL VKSHASSVEE DA +1+ SFS +SK 
Sbjct: 1 LSKKRLHEIAKEIGKSSKEWEHAKYLGLDVKSHASSVEEADAKKIISSFSKASKPDVTA 60 

5 Query: 60 NSVQTNQGVKTESKTVETKQGLSDDKPSTQPVAKPKPQSRNFKAEREARAKAEAEKRQHN 119 

+ +V S TV t G St TQ V+KPK SRNFKAEREARAK +A ++Q N 

Sbjct: 61 SQTVKPKEVAQPSVTVVKETG-SEHVEKTQ-VSKPK--SRNFKAEREARAKEQAARKQAN 116 

Query: 120 GD HRKNNRHNDTRSDDRR- -HQGQKRSNGNR NDNRQ--G 154 

10 G +R+ N H D+R H+ Q +N R +DN Q G 

Sbjct: 117 GSSHRSQERRGGYRQPNNHQTNEQGDKRITHRSQGDTNDKRIERKASNVSPRHDNHQLVG 176 

Query: 155 QQNN RNKNDGRYADHKQKPQTRPQQPAGNRIDFKARAAALKAEQNAEYSRHSEQRF 210 

+N N +GR+ + K++ + PQ + +IDFKARAAALKAEQNAEYSR SE RF 

15 Sbjct: 177 DRNRSFAKEMHKNGRFTNQKKQGRQEPQSKSP-KIDFKARAAALKAEQNAEYSRQSETRF 235 

Query: 211 REEQFAKRQAAKEQEIAKAAALKAQEEAQKAKEKIASKPVAKVKEIVNKVAATPSQTADS 270 

R +QEAKR A ++ AK AALKAQ E +EAK+ + ++ + TAD+ 

Sbjct: 236 RAQQEAKRLAELARQEAKEAALKAQAEEMSHREA-ALKSIEEAETKLKSSNISAKSTADN 294 

20 

Query: 271 RRKKQTRSDKSRQFS^m;NEDGQKQTRNKKNWl^QNQVRNQRNSN™HNKKNKKGK T 326 

RRKKQ R +K+R+ ++ +++GQK +NKK+WN+QNQVRNQ+NSNWN NKK KKGK T 
Sbjct: 295 RRKKQARPEKNRELTHHSQEGQK- - KNKKSWNSQNQVRNQKNSNWNKNKKTKKGKNVKNT 352 

25 Query: 327 NGAPKPVTERKFHELPKEFEYTEGMTVAEIAKRIKREPAEIVKKLFMMGVMATQNQSLDG 386 

N APKPVTERKFHELPKEFEYTEGMTVAEIAKRIKREPAEIVKKLFMMGVMATQNQSLDG 
Sbjct: 353 OTAPKPVTERKFHELPKEFEYTEGMTVAEIAKRIKREPAEIVKKLFMMGVMATQNQSLDG 412 

Query: 387 DTIELLMVDYGIEAHAKVEVDEADIERFFADEDYI^PDNLTERPPVVTIMGHVDHGKTTL 446 
30 DTIELLMVDYGIEA AKVEVD+ADIERFF DE+YLNP+N+ ER PWTIMGHVDHGKTTL 

Sbjct: 413 DTIELLMVDYGIEAKAKVEVDDADIERFFEDEOTLNPENIVERAPVVTIMGHVDHGKTTL 472 

Query: 447 LDTLRNSRVATGEAGGITQHIGAYQIEEAGKKITFLDTPGHAAFTSMRARGASVTDITIL 506 
LDTLRNSRVATGEAGGITQHIGaYQIEEAGKKITFLDTPGHA&FTSMRARGASVTDITIL 
35 Sbjct: 473 LDTLRNSRVATGEAGGITQHIGAYQIEEAGKKITFLDTPGHftAFTSMRARGASVTDITIL 532 

Query: 507 IVAADDGVMPQTVEAINHSKAAGVPIIVAINKIDKPGANPERVISELAEHGVISTAWGGE 566 

IVAADDGVMPQT+EAINHSKAAGVPIIVAINKIDKPGANPERVI+ELAE+G+ISTAWGGE 
Sbjct: 533 IVAADDGVMPQTIEAINHSKAAGVPI1VAINKIDKPGANPERVIAELAEYGIISTAWGGE 592 

40 

Query: 567 SEFVEISAKFGKNIQELLETVLLVAEMEELKADADVRAIGTVIEARLDKGKGAVATLLVQ 626 

EFVEISAKF KNI ELLETVLLVAE+EELKAD VRAIGTVIEARLDKGKGA+ATLLVQ 
Sbjct: 593 CEFVEISAKFNKNIDELLETVLLVAEWELKADPTVRAIGTVIEARLDKGKGAIATLLVQ 652 

45 Query: 627 QGTLOTQDPIWGOTFGRVRAMTOTDLGRRVKVAGPSTPVSITGLNEAPMAGDHFAVYADE 686 

QGTL+VQDPIWGNTFGRVRAM NDLGRRVK A PSTPVSITGLNE PMAGDHFAVYADE 
Sbjct: 653 QGTLHVQDPIWGNTFGRVRAMVNDLGRRVKSAEPSTPVSITGLNETPMAGDHFAVYADE 712 

Query: 687 KAARAAGEERAKRALLKQRQNTQRVS LENL FDTLKAGEVKS VNVI I KADVQGS VEALAAS 746 
50 KAARAAGEER+KRALLKQRQOTQRVSL+MiFDTLKAGE+K+VNVIIKADVQGSVEALAAS 

Sbjct: 713 KAARAAGEERSKRALIiKQRQNTQRVSLDNLFDTLKAGE I KTVNVI I KADVQGS VEALAAS 772 

Query: 747 LLKIDVEGVKVNVVHSAVGAINESDWLAEASNAVIIGFNVRPTPQARQQADADDVEIRQ 806 
L+KI+VEGV+VNVVHSAVGAINESDVTLAEASNAVIIGFNVRPTPQARQQAD DDVEIR 
55 Sbjct: 773 LVKIEVEGWVNVVHSAVGAINESDVTIAEASNAVIIGFNVRPTPQARQQADTDDVEIRL 832 

Query: 807 HSIlYKVIEETOEAMKGKLDPEYQEKIlJGEAIIRETFKVSKVGTIGGFMVINGKVTRDSS 866 

HS 1 1 YKVIEEVEEAMKGKLDP YQEKILGEAI IRETFKVSKVGTIGGFMVINGKVTRDSS 
Sbjct: 833 HS 1 1 YKVIEEVEEAMKGKLDPVYQEKILGEAI IRETFKVSKVGTIGGFMVINGKVTRDSS 892 



60 



Query: 867 TOVIRDGWIFDGKIASLKHYKDDVKEVGmQEGGLMIENYNDLKEDDTIEAYIMEEIKRK 927 

VRVIRD WI FDGKLASLKHYKDDVKEVGNAQEGGLMIEN+NDLK DDTIEAYIMEEI RK 
Sbjct: 893 TOVIRDSVVIFDGKIASLKHYKDDVKEVGNAQEGGLMIENFNDLKVDDTIEAYIMEEIVRK 953 



65 Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 
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Example 1127 

A DNA sequence (GBSxl202) was identified in S.agalactiae <SEQ ID 349 1> which encodes the amino 
acid sequence <SEQ ID 3492>. This protein is predicted to be ribosome binding factor A (rbfA). Analysis 
of this protein sequence reveals the following: 

5 Possible site: 56 

;>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2557 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9567> which encodes amino acid sequence <SEQ ID 9568> 
was also identified. 

15 A related DNA sequence was identified in S.pyogenes <SEQ ID 3493> which encodes the amino acid 
sequence <SEQ ID 3494>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>» Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0 .4765 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 An alignment of the GAS and GBS proteins is shown below. 

Identities = 93/117 (79%) , Positives = 103/117 (87%) 

Query: 8 LIMANHRIDRVGMEIKREViraiLRLRVNDPRvQDVTITDVQMLGDLSMAKVFYTIHSTIiA 67 
+ MANHRIDR VGME I KRE VN+ 1 L+ +V DPRVQ VTIT+VQM GDLS+AKV+YTI S LA 
30 Sbjct: 1 MAMANHRIDRVGMEIKREViroiLQKEVRD^^ 60 

Query: 68 SDNQKAQIGLEKATGTI KRELGKNLTMYKI PDLQFVKDES IEYGNKIDEMLRNLDKK 124 

SDNQKAQ GLEKATGTIKRELGK LTMYKIPDL F KD SI YGNKID++LR+LD K 
Sbjct: 61 SDNQKAQTGLEKATGTIKRELGKQLTMYKIPDLVFEKDNSIAYGNKIDQLLRDLDNK 117 

35 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1128 

A DNA sequence (GBSxl203) was identified in S.agalactiae <SEQ ID 3495> which encodes the amino 
40 acid sequence <SEQ ID 3496>. This protein is predicted to be esterase. Analysis of this protein sequence 
reveals the following: 

Possible site: 28 

>>> Seems to have a cleavable N-term signal seq. 

45 Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty^=0 . 0000 (Not Clear) < suco 

50 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA79277 GB:M64783 acetyl-hydrolase [Streptomyces hygroscopicus] 
Identities = 58/220 (26%) , Positives = 90/220 (40%) , Gaps = 8/220 (3%) 
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Query: 98 WNDNGKANQKTIFYLAGGSYLNNPTPYHISMLICTLSTSLDAKIILPIYPKTPRYTYDYAI 157 

W + + +T+ YL GGSY H + L + A++ Y+P +A+ 

Sbjct: 58 WVRPARQDGRTLLYLHGGSYALGSPQSHRHLSSALGDAAGAAVLALHYRRPPESPFPAAV 117 

5 Query: 158 PRLVNLYRHFHEKN ANLTLMGDSAGGGLALGLAHALSHQSGQEAIPQPKNIILLSPW 214 ' 

V YR E+ +TL GDSAG GLA+ AL P P + +SPW 

Sbjct: 118 EDAVAAYRMLLEQGCPPGRVTLAGDSAGAGLAVAALQAIiR DAGTPLPAAAVCISPW 173 

' Query: 215 LDVTMKHPEIPKYEDTDPILSAWGIARVGEIWANGSMn?lfflTYVSPKNAPATKLAPITLF 274 
10 D+ + + + +L L R+ E + G+ + H SP + T L P+ + 

Sbjct: 174 ADLACEGASHTTRKARE I LLDTADLRRMAERYLAGT - DPRHPLAS PAHGDLTGL P PLL I Q 232 

Query: 275 TGTREIFFPDIRDYAAQLQAANHPVNYIAQEGMNHVYPIY 314 
G+ E+ D R A PV + M HV+ Y 

15 Sbjct: 233 VGSEEVLHDDARALEQAALKAGTPVTFEEWPEMFHVWHWY 272 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3497> which encodes the amino acid 
sequence <SEQ ID 3498>. Analysis of this protein sequence reveals the following: 

Possible site: 27 
20 >>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 244/334 (73%) , Positives = 280/334 (83%) , Gaps = 6/334 (1%) 

30 Query: 1 MKPSFKKLLLLFSIITILSIACTPHAKASGRSWKSWFIEQYFWLKRDKSYYKVQDESSFQ 60 

+K +K L+ ++ L + TP A AS RSWKSWFIEQYFWLKRDKSYY QD+ SFQ 
Sbjct: 1 LKHPIRKTLVTLGLLLTLCLP-TPVA-ASSRSWKSWFIEQYFWLKRDKSYYSKQDDPSFQ 58 

Query: 61 KYLNASREQSDKGYYLDPNSVNGGLVQERLFDMQVYSWNDNGKANQKTIFYLAGGSYLNN 120 
35 +YL+A REQSDK Y LD N VNG LVQE L+ MQVYSWNDNGK +QKTI YLAGGSYLNN 

Sbjct: 59 RYLDACREQSDKPYQLDTNLVNGPLVQENLYGMQVYSWNDNGKPDQKTIIYLAGGSYLNN 118 

Query: 121 PTPYHISMLKTLSTSLDAKIILPIYPKTPRYTYDYAIPRLVNLYRHFHEKNANLTLMGDS 180 
PT YHI+MLKTLSTSLDAKI+LPIYPK PRYTY+Y +P+LVNLY+H++ KN N+ LMGDS 
40 Sbjct: 119 PTTYHINMLKTLSTSLDAKIVLPIYPKAPRYTYNYTMPKLVNLYQHYYHKNQNVFLMGDS 178 

Query: 181 AGGGLALGIAHALSHQSGQEAIPQPKNIILLSPWLDVTMKHPEIPKYEDTDPILSAWGLA 240 

AGGGLALGLAHAL + E++PQPK ++LLSPWLDVTM HPEIP+YED DPILS+WGL 
Sbjct: 179 AGGGLALGLAHALHN ESVPQPKQLVLLSPWLDVTMSHPEIPEYEDADPILSSWGLK 234 

45 

Query: 241 RVGEIWANGSNNTNHTWSPKNAPATKIAPITLFTGTREIFFPDIRDYAAQLOAANHPVN 300 

RVGE+WA ++NTNH YVSPKN P T L PITLFTGTREIF+PDIRDYAA+L+AANH + 
Sbjct: 235 RVGELWAYSADNTNHIYVSPKNGPITYLPPITLFTGTREIFYPDIRDYAAKLKAANHNIT 294 

50 Query: 301 YIAQEGMNHVYPIYPIEEAKTAQYQMIDI INKTP 334 

+1 QEGMNHVYPIYPIEEAKrAQYQ+ID INKTP 
Sbjct: 295 FITQEGMNHVYPI YPIEEAKTAQYQI IDAINKTP 328 

A related GBS gene <SEQ ID 873 1> and protein <SEQ ID 8732> were also identified. Analysis of this 
55 protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 
McG: Discrim Score: 11.88 
GvH: Signal Score (-7.5): -1.33 
Possible site: 28 
60 >>> Seems to have a cleavable N-term signal seq. 

ALOM program count: 0 value: 4.03 threshold: 0.0 
PERIPHERAL Likelihood = 4.03 174 
modified ALOM score: -1.31 
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*** Reasoning Step: 3 



Final Results 

5 bacterial outside Certainty=0. 3 000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

10 28.4/46.2% over 220aa 

Streptomyces 

hygroscopicus 

EGAD|5925| acetyl -hydrolase Insert characterized 

15 ORF00486(589 - 1245 of 1602) 

EGAD | 5925 | 5724 (57 - 277 of 300) acetyl -hydrolase {Streptomyces hygroscopicus} 
%Match =6.8 

%Identity =28.3 %Similarity =46.1 

Matches = 62 Mismatches = 111 Conservative Sub.s = 39 



20 



462 492 522 552 582 612 642 669 

KRDKSyYKVQDESSFQKYmASREQSDKGYYLDPNSTOGGLVQERLFDMQWSVMDNGKANQKTIFYLAGGSY-LNNPTP 



ELELWELIELNVfflTRNGEMEPRRIAYDRAQEAFGNLGVPPGDVVTVGHCTAEWVRPARQDGRTLLYLHGGSYALGSPQS 
25 20 30 40 50 60 70 80 

696 726 756 786 837 867 897 

Y-HISMLKTLSTSLDAKIILPIYPKTPRYTYDYAIPRLVNLYRHFHEKN ANLTLMGDSAGGGLALGLAHALSHQSGQ 

: |:| | : | :: | = | : [: | I! : |: :|| MM llh Ml 

30 HRHLS- -SALGDAAGAAW^HYRRPPESPFPAAVEDAVAAYRMLLEC^CT RD 

100 110 120 130 140 150 

927 957 987 1017 1047 1077 1107 1137 

EAI PQPKNI ILLSPWLDVTMKHPEI PKOTDTDPILSAWGL 
35 || =111 t = : : =1 | |: | : | = : | || : MM- h i 

AGTPLPAAAVCISPWADLACEGASHTTRKAREILLDTADLRRMAERYIAGTD-PRHPLASPAHGDLTGLPPLLIQVGSEE 
170 180 190 200 210 220 230 

1167 1197 1227 1245 1275 1305 1335 1365 

40 IFFPDIRDYAAQLQAANHPVNYIAQEGMNHV YPIYPIEEAKTAQYQMIDIINKTP*Y*LSQL*SYKK*TMILTWFI 

== I I 111= III Ml = =1 

VLHDDARALEQAALKAGTPVTFEEWPEMFHVWHWYHPVLPEGRRAA.IEVAGAFLRTATGEGLK 
250 260 270 280 290 300 

45 SEQ ID 8732 (GBS149) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 23 (lane 6; MW 37kDa). 

The GBS149-His fusion product was purified (Figure 196, lane 6) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 291), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

50 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1129 

A DNA sequence (GBSxl204) was identified in S.agalactiae <SEQ ID 3499> which encodes the amino 
acid sequence <SEQ ID 3500>. This protein is predicted to be CopY. Analysis of this protein sequence 
55 reveals the following: 

Possible site: 22 

»> Seems to have no N-terminal signal sequence 



WO 02/34771 



-1262- 



PCT/GB01/04789 



10 



Final Results 

bacterial cytoplasm Certainty=0. 3140 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG10085 GB:AF296446 CopY [Streptococcus mutans] 
Identities = 67/137 (48%) , Positives = 98/137 (70%) 

Query: 2 TISSAEVffilMRVWAQQNTTSNEILAVLLEKYDWTPSTVKTLLRRLLDKGYVSREKMGKG 61 

+IS+AEWE+MRWWA+Q T+S+EI+A+L Y W+ ST+KTL+ RL +KGY++ ++ G+ 
Sbjct: 3 SISNAEWEVMRWWAKQMTSSSEIIAILSRTYCWSASTIKTLITRLSEKGYLTSQRQGRK 62 

15 Query: 62 FSYSPLIDEDLAMMSEVDSVFQKVCQTKHVAIVRHLLESIPMTEKDRLNLQSSLEAKKGK 121 

+ YS LI E+ A+ +V VF ++C TKH A++RHL+E PMT D L++ L +KK 
Sbjct: 63 YIYSSLISEEEALEQQVSEVFSRICVTKHQALIRHLVEETPMTLSDIEKLEALLLSKKAN 122 

Query: 122 TLERVACNCI PGQCQCH 138 
20 + V CNCI GQC C+ 

Sbjct: 123 AVPEVKCNCIVGQCSCY 139 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3501> which encodes the amino acid 

sequence <SEQ ID 3502>. Analysis of this protein sequence reveals the following: 

25 Possible site: 13 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2331 (Affirmative) < suco 

30 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 54/135 (40%) , Positives = 84/135 (62%) 



35 



Query: 3 ISSAEWEIMRVWAQQNTTSNEIIiAVLLEKYDKTPSTVKTLLRRLLDKGYVSREKMGKGF 62 

IS+AEWE+MRWWA + S++I+ +L +KY W+ ST+KTL+ RL+ K +++ + G+ + 
Sbjct: 10 ISAAEWEVMRVVWASGDIKSSDIITILRKKYQWSDSTIKTLIGRLVKKNFLTSYRQGRAY 69 



40 Query: 63 SYSPLIDEDLAMMSEVDSVFQKVCQTKHVAIVRHLLESIPMTEKDRLNLQSSLEAKKGKT 122 

Y L+DE L + +V +CQ +H ++ L +PMT ++ Q LE KK 

Sbjct: 70 IYQALLDETLLQKEALATVLDGICQRQHTRLLLERLYHLPMTLEEIGAFQELLEVKKENA 129 

Query: 123 LERVACNCI PGQCQC 137 
45 + V CNC+PGQC C 

Sbjct: 130 VLEVPCNCLPGQCHC 144 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

50 Example 1130 

A DNA sequence (GBSxl206) was identified in S.agalactiae <SEQ ID 3503> which encodes the amino 
acid sequence <SEQ ID 3504>. This protein is predicted to be CopA. Analysis of this protein sequence 
reveals the following: 

Possible site: 19 
55 >>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.82 Transmembrane 382 - 398 ( 370 - 406) 
INTEGRAL Likelihood = -8.01 Transmembrane 356 - 372 ( 344 - 374) 
INTEGRAL Likelihood = -2.50 Transmembrane 719 - 735 ( 719 - 738) 
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INTEGRAL Likelihood = -2.28 Transmembrane 202 - 218 ( 202 - 218) 

INTEGRAL Likelihood = -1.59 Transmembrane 593 - 709 ( 691 - 712) 

INTEGRAL Likelihood = -1.33 Transmembrane 167 - 183 ( 167 - 183) 

5 Final Results 

bacterial membrane Certainty=0. 4927 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

10 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG10086 GB:AF296446 CopA [Streptococcus mutans] 
Identities = 440/740 (59%) , Positives = 571/740 (76%) , Gaps = 1/740 (0%) 

Query: 5 KETFLIDGMTCASCALTIEKAWKLDHVDSAVVNIATEKMTVTFDDTTLSPNVIEECVSE 64 
15 +E FLIDGMTCASCA+ +E AV KLD ++SAWNL TEKMT+ +D +S + + V+ 

Sbjct: 3 EEVFLIDGMTCASCAINVENAWKLDGIESAVWLTTEKMTIDYDAAKVSEADVTKAVAG 62 

Query: 65 SGYEASLFKEETSKSQSERHQLAIEKMWHRFWMSAVATIPLLYISMGPMINLWLPSFLMP 124 
+GY A ++ T++SQ +R + + + R +++ TIPL YI+MG M+ L LP+FL P 
20 Sbjct: 63 AGYGAKVYDPTTAESQKDREEHKLAGIKKRLLWTSIFTIPLFYIAMGSMVGLPLPNFLAP 122 

Query: 125 DKGPLNYGMIQLLLTLPVMYFGRIFYQNGFKALFKRHPNMDSLVAIATTAAFIYSLYGLY 184 

PL Y M+ LLLT+PV+ FY NGF++LFK HPNMDSLV++ATTAAF+YSLYG Y 

Sbjct: 123 SSAPLTYAMVLLLLTIPVIVLSWSFYDNGFRSLFKGHPNMDSLVSLATTAAFLYSLYGTY 182 

25 

Query: 185 EILQGDIHYAHQLYFESVAVILTLITLGKYFEILSKGRTSASIEKLLTLSAKEARVIKDG 244 

+ G H+AH LY+ES VAVI LTL I TLGKYFE LSKGRTS +I+KL+ LSAKEA +I+DG 
Sbjct: 183 HVYLGHTHHAHHLYYESVAVILTLITLGKYFETLSKGRTSDAIKKLMHLSAKEATLIRDG 242 

30 Query: 245 EDYMVPLDKVKIGETILVKPGEKIPLDGHWAGESSIDESMLTGESIPVEKKVGSKVYGA 304 

E+ VP+++V+I + ILVKPGEKIP+DG V++G S+IDESMLTGESIP+EK S VY 
Sbjct: 243 EEIKVPIEQVQIRDQILVKPGEKIPVDGRVLSGHSAIDESMLTGESIPIEKMADSPVYAG 302 

Query: 305 S INGQGSLTIFVEKEAGGSLLSQI INLVEAAQTSKAPIANIiADKVEGVFVPFVIVIAILS 364 
35 SINGQGSLT EK +LLSQII LVE AQ +KAPIA +ADKVS VFVP +1 IAIL+ 

Sbjct: 303 SINGQGSLTFEAEKVGNETLLSQIIKLVENAQQTKAPIAKIADKVSAVFVPVIITIAILT 362 

Query: 365 GLSWYLILGQSFAFSLKIMIAVLVIACPCALGLATPTAIMVASGKAAENGILFKGGEVLE 424 
GL WY ++GQ F FS+ I +AVLVIACPCALGLATPTAIMV +G+AAENGIL+K G+VLE 
40 Sbjct: 363 GLFWYFVMGQDFTFSMTISVAVLVIACPCALGLATPTAIMVGTGRAAENGILYKRGDVLE 422 

Query: 425 KAHHIDTIVFDKTGTLTKGKPEWAIKTYGGDKEEFLGQVASVEKLSNHPLSQTIVNJCAK 484 

AH I +TIVFDKTGT+T+GKPEW +Y D+ + + A++E LS HPLSQ IV+ AK 
Sbjct: 423 LAHQINTIVFDKTGTITQGKPEWHQFSY-HDRTDLVQVTAALEALSEHPLSQAIVDYAK 481 

45 

Query: 485 EKELPLREVmFKNILGYGLSATINGKTMLVGIffiNmTKNDVNLDLAKADIEIAQEEAQT 544 

++ L V F ++ G GL + +T+LVGN LM + +++L+ A+AD + A + QT 
Sbjct: 482 KEGTHLLAVDDFTSLTGLGLKGCVADETLLVGNEKLMRQANISLEQAQADFKAATAQGQT 541 

50 Query: 545 WYVSENGVLSGLITLTDQLKTDSQETVKQLQRLGFNLVLLTGDNKASADAIAQKLGITT 604 

++V+ +G L GLIT+ D++K DS TVK LQ +G + +LTGDN+ +A AIA+++GIT 
Sbjct: 542 PIFVASDGQLLGLITIADKVKNDSAATVKALQNMGVEVAMLTGDNEETAQAIAKEVGITF 601 

Query: 605 WSEVLPDQKANVILELKEKGGQIAMVGDGINDAPALASSDVGISMSSGTDIAIESADIV 664 
55 V+S+V +K IL+L+ +G ++AMVGDGINDAPALA++D+GISM SGTDIA+ESADIV 

Sbjct: 602 VISQVFSQEKTQAILDLQAEGKKVAMVGDGINDAPALATADIGISMGSGTDIAMESADIV 661 

Query: 665 LMKPELTDLLKAMTISKQTIQIIKENLFWAFFYJWLAIPVAMGVLHLFGGPLLNPMLAGL 724 
LMKP + D++KA+ IS+ TI IKENLFWAF YNVL++P+AMGVL+LFGGPLL+PM+AGL 
60 Sbjct: 662 LMKPAMLD 1 1 KALKI SRVTI INI KENLFWAFI YNVLSVPIAMGVLYLFGGPLLDPMIAGL 721 

Query: 725 AMAFS SVSWliNALRLKvLK 744 

AM+FSSVSWLNALRLKV+K 
Sbjct: 722 AMSFSSVSWLNALRLKWK 741 



65 



There is also homology to SEQ ID 3506. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1131 

A DNA sequence (GBSxl207) was identified in S.agalactiae <SEQ ID 3507> which encodes the amino 
5 acid sequence <SEQ ID 3508>. This protein is predicted to be cation-transporting ATPase, P-type (pacS). 
Analysis of this protein sequence reveals the following: 

Possible site: 28 

>» Seems to have no N- terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 . 1934 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG10087 GB:AF296446 CopZ [Streptococcus mutans] 
Identities = 31/67 (46%) , Positives = 43/67 (63%) 

Query: 1 MKHTYRVSGMKCDGCAKWSDKLSSVIGvDEVNVDLTKNQVWSGKTFKWLLKRSLKDTK 60 
20 M+ TY + G+KC GCA V+ + S + V++V VDL K +V ++G KW LKR+LK T 

Sbjct: 1 MEKTYHIDGLKCQGCADNVTKRFSELKKVirovKVDLDKKETO 60 

Query: 61 YSLEEEI 67 
Y L EI 

25 Sbjct: 61 YELGAEI 67 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3509> which encodes the amino acid 
sequence <SEQ ID 3510>. Analysis of this protein sequence reveals the following: 

Possible site: 18 
30 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2997 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 33/63 (52%) , Positives = 48/63 (75%) 

40 Query: 1 MKHTYRVSGMKCDGCAKTVSDKLSSVIGVDEVNVDLTKNQWVSGKTFKWLLKRSLKDTK 60 

M+ Y+V+GM CDGCA+TV++KLS+V GV V V+L K + V+G+ +L+ KR+LKDTK 
Sbjct: 1 MEKHYQVTGMTCDGCARTVTEKLSAVPGVQSVQVNLEKGEAKVTGRPLTFLIKRALKDTK 60 

Query: 61 YSL 63 
45 + L 

Sbjct: 61 FEL 63 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

50 Example 1132 

A DNA sequence (GBSxl208) was identified in S.agalactiae <SEQ ID 351 1> which encodes the amino 
acid sequence <SEQ ID 3512>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
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>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -7.59 Transmembrane 67 - 83 ( 65 - 90) 

INTEGRAL Likelihood = -3.72 Transmembrane 35 - 51 ( 31 - 51) 

INTEGRAL Likelihood = -3.61 Transmembrane 122 - 138 ( 120 - 139) 

5 INTEGRAL Likelihood = -1.59 Transmembrane 154 - 170 ( 154 - 171) 

Final Results 

bacterial membrane Certainty=0. 4 03 6 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8733> which encodes amino acid sequence <SEQ ID 8734> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 5 
15 McG: Discrim Score: 4.09 

GvH: Signal Score (-7.5): 3.87 

Possible site: 20 
»> Seems to have a cleavable N-term signal seq. 
ALOM program count: 4 value: -7.59 threshold: 0.0 
20 INTEGRAL Likelihood = -7.59 Transmembrane 65 - 81 ( 63 - 88) 

INTEGRAL Likelihood = -3.72 Transmembrane 33 - 49 ( 29 - 49) 
INTEGRAL Likelihood = -3.61 Transmembrane 120 - 136 ( 118 - 137) 
INTEGRAL Likelihood = -1.59 Transmembrane 152 - 168 ( 152 - 169) 
PERIPHERAL Likelihood =0.85 96 
25 modified ALOM score: 2.02 

*** Reasoning Step: 3 

Final Results 

30 bacterial membrane Certainty=0. 4036 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

35 , >GP:CAB15351 GB:Z99121 similar to hypothetical proteins [Bacillus subtilis] 

Identities = 107/192 (55%) , Positives = 137/192 (70%) 

Query: 8 WNILSLVGTVAFASSGAIVAIEEEFDILGLFILGFVTAFGGGAIRNVLIGLPIETLWSQG 67 
W +LS++G +AFA SGAI VA+EEE+DILG++ILG VTAFGGGAIRN+LIG+P+ LW QG 
40 Sbjct: 3 WELLSVIGIIAFAVSGAIVAMEEEYDILGVYILGIVTAFGGGAIRNLLIGVPVSALWEQG 62 

Query: 68 IAFYAAAAAILFIMIFPNLLSGKGRDAEWSDAIGLAAFSVQGALYATQSHQPLSAVIVA 127 

F A +1 + +FP LL +SDAIGLAAF++QGALYA + PLSAVIVA 

Sbjct: 63 AYFQIALLSITIVFLFPKLLLKHWNKWGNLSDAIGLAAFAIQGALYAVKMGHPLSAVIVA 122 



45 



Query: 128 AVLTGAGGGIVRDVLAGRKPGVLRSEIYAGWSILVGIILYFKIAKTTTDYYLLVLWTSL 187 

AVLTG+GGGI +RD+LAGRKP VL++EIYA W+ L G+I+ + Y+L V+ 

Sbjct: 123 AVLTGSGGGI IRDLLAGRKPLVLKAE I YAWAALGGL I VGLGWLGNS FGLYVLFFVLWC 182 



50 Query: 188 RMLGYKKQWHLP 199 

R+ Y W LP 
Sbjct: 183 RVCSYMFNWKLP 194 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3513> which encodes the amino acid 
55 sequence <SEQ ID 3514>. Analysis of this protein sequence reveals the following: 

Possible site: 27 
»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -5.15 Transmembrane 70 - 86 ( 65 - 88) 

INTEGRAL Likelihood = -4.09 Transmembrane 33 - 49 ( 29 - 49) 

60 INTEGRAL Likelihood = -2.13 Transmembrane 120 - 136 ( 119 - 137) 

INTEGRAL Likelihood = -0.43 Transmembrane 173 - 189 ( 172 - 189) 



Final Results 
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bacterial membrane Certainty=0. 3060 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

5 The protein has homology with the following sequences in the databases: 

>GP:BAB05428 GB:AP001512 unknown conserved protein [Bacillus halodurans] 
Identities = 109/195 (55%) , Positives = 137/195 (69%) 

Query: 6 WEILNIIGTIAFALSGAIVAMEEEFDILGIFILGFVTAFGGGAIRNTLIGIjPIEALWGQK 65 
10 W++LN+IGTIAFALSG IVAMEE+FD++G++ILGFVTAFGGGAIRN LIG+P+ ALW Q 

Sbjct: 3 VroVLNVIGTIAFALSGVIVAMEEDFDLMGVYILGFVTAFGGGAIRNLLIGVPVSALWEQG 62 

Query: 66 PEFTCAFFAMVLIMLFPKLMARGWvRAAvLTDAIGLAAFSVQGALHAVRLNQPLSAVIVT 125 
FT AF M + PL W++ +L DAIGLAAF++QGAL A ++ PLSAVIV 
15 Sbjct: 63 TLFTIAFIVMTIAFFLPNLWINHWLKFGLLFDAIGLAAFAIQGALFATSMDHPLSAVIVA 122 

Query: 126 AVIiTGAGGGVVRDILAGRKPSvLRSEIYAGWSILAAIVLHFKLADSTIECYALvvIiLTTL 185 

A LTGAGGG+VRD+LA RKP VL EIY GW++LA + + I , L++L+ L 
Sbjct: 123 AALTGAGGGITODMLARRKPLVLSKEIYIGWAMIAGAAIGLNIVSGPIGIGFLIILWFL 182 

20 

Query: 186 RMIGNRKKWNLPKIK 200 

RM+ W LP K 

Sbjct: 183 RMLSVHYNWCLPHRK 197 

25 An alignment of the GAS and GBS proteins is shown below. 

Identities = 133/200 (66%) , Positives = 168/200 (83%) 

Query: 3 MSIDIWNILSLVGTVAFASSGAIVAIEEEFDILGLFILGFVTAFGGGAIRNVLIGLPIET 62 
M+ID+W IL+++GT+AFA SGAIVA+EEEFDILG+FILGFVTAFGGGAIRN LIGLPIE 
30 Sbjct: 1 MTIDMWEILNIIGTIAFALSGAIVAMEEEFDILGIFILGFVTAFGGGAIRNTLIGLPIEA 60 

Query: 63 LWSQGIAFYAAAAAILFIMIFPNLLSGKGRDAEWSl^IGIiAAFSVQGALYATQSHQPLS 122 

LW Q FA A++ IM+FP L++ A V++DAIGLAAFSVQGAL+A + +QPLS 

Sbjct: 61 LWGQKPEFTCAFFAIWLIMLFPKLMARGWVRAAvLTDAIGLflAFSVQGALHAVRLNQPLS 120 

35 

Query: 123 AVIVAAVLTGAGGGIVRDVLaGRKPGVLRSEIYAGWSILVGIILYFKIAKTTTDYYLLVL 182 

AVIV AVLTGAGGG+VRD+LAGRKP VLRSE I YAGWS I L I+L+FK+A +T + Y LV+ 
Sbjct: 121 AVIOTAVLTGAGGGVTODIIAGRKPSVLRSEIYAGWSILAAIVLHFKLADSTIECTALW 180 

40 Query: 183 WTSLRMLGYKKQWHLPWR 202 

++T+LRM+G +K+W+LP ++ 
Sbjct: 181 LLTTLRMIGNRKKWNLPKIK 200 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
45 vaccines or diagnostics. 

Example 1133 

A DNA sequence (GBSxl209) was identified in S.agalactiae <SEQ ID 3515> which encodes the amino 
acid sequence <SEQ ID 3516>. Analysis of this protein sequence reveals the following: 

Possible site: 42 
50 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2805 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

55 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9569> which encodes amino acid sequence <SEQ ID 9570> 
was also identified. 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB94816 GB:AJ245582 hypothetical protein [Streptococcus thermophilus] 
Identities = 138/238 (57%) , Positives = 184/238 (76%) 



Query: 


5 


KKMIKIjlAlDMDGlLLil>EJEKKIPKENIQAIKEATQAGIKI VIjCTGRPMSGILPYFTnELGIj 


64 






+ +KLIAIDMDGTLLN +K+ 1 PKENI +AI+EAT AGIKIVLCTGRP SGI+P+F +LGL 




Sbjct: 


3 


QNQVKLIAIDMDGTLIiNSQKEIPKENIKAIQEATAAGIKIVLCTGRPRSGIVPHFEKLGL 


62 


Query: 


65 


TKEEYIIMtWGCSTYSTKDWQLIDSATLTHDELIFLEEVVKEFPNVCLTLTAENTFYAVG 


124 






++EE+IIMNNGCSTY TK+W L++S +L+ E+ L + ++FP V LT T E ++Y VG 




Sbjct: 


63 


SEEEFIIMNNGCSTYETKNWTLLESESLSRSEMEELLQACEDFPGVALTFTGEKSYYWG 


122 


Query: 


125 


EEVPEIVAYDADLVFTKAKSTSLDALRNQEEIVFQAMYMGLDADVTAFQEAVEEALISKF 


184 






EVPE+VAYDA VFT+AK+ SL+ + + +++FQAMYM + AFQ AV++ L + 




Sbjct: 


123 


NEVPELVAYDAGTVFTEAKARSLEEIFEEGQVIFQAMYMAESEPLDAFQNAVQDRLDQSY 


182 


Query: 


185 


SGVRSQDYIYEIMPQGVTKARGLKSLIAKLGLDINQVMAIGDAPNDIELLDLVPNSVA 242 






S VRSQ+YI+E+MPQG TKA GLK L KL ++ +Q+MA+GDA ND+E+L V SVA 




Sb j ct : 


183 


STTOSQEYIFEVMPQGATKASGLKHLAEKLDINRDQIMALGDAANDLEMLQFVGQSVA 240 



A related DNA sequence was identified in S. pyogenes <SEQ ID 3517> which encodes the amino acid 
sequence <SEQ ID 3518>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1468 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 152/270 (56%), Positives = 193/270 (71%) 



Query: 


6 


KMIKLIAIDMDGTLLNDEKKIPKENIQAIKEATQAGIKIVLCTGRPMSGILPYFNELGLT 


65 






+MI+LIAID+DGTLLN +K+IPKENI AI+EA Q+G+KI VLCTGRP SG PYF++LGLT 




Sbjct: 


19 


RMIQLIAIDLDGTLLNQDKQIPKENITAIQEAAQSGLKIVLCTGRPQSGTRPYFDQLGLT 


78 


Query. 


66 


KEEYIIMNNGCSTYSTKDWQLIDSATLTHDELIFLEEVVKEFPNVCLTLTAENTFYAVGE 


125 






+EE++I+NNGCSTYS+ DWQL S L ++ LEE+ + FP++ LTLT EN + + E 




Sb j ct : 


79 


QEEFLIINNGCSTYSSPDWQLRHSKMLKVSDIELLEELSQSFPDIYLTLTEENDYLVLEE 


138 


Query: 


126 


EVPEIVAYDADLVFTKAKSTSLDALRNQEEIWQAMYMGLDADVTAFQEAVEEALISKFS 


185 






EVP++V D DLVFT K SL L + ++FQAMY+G A + AF+ AV L F 




Sbjct: 


139 


EVPDLVQEDGDLVFTIVKPVSLAELSDTPRLIFQAMYLGEKAALDAFERAVRNQLSQSFH 


198 


Query: 


186 


GTOSQDYIYEIMPQGOTKARGIjKSLIAKLGLDINQVMAIGDAPNDIELLDLVPNSVAMGN 


245 






VRSQD I EI+PQGV+KA LK L+ LGL +QVMAIGDAPNDIE+L VAM N 




Sbjct: 


199 


WRSQDNILEILPQGVSKASALKELVEDLGLTADQvMAIGDAPNDIEMLTYAGLGVAMEN 


258 


Query: 


246 


ASDEIKSRCKYITVDNNKAGVAKAIYDYAL 275 








AS IK +T+ N+ AGVA+AI +AL 




Sb j ct : 


259 


ASAAl KPLADKVTLTNDMAGVAQAI RQFAL 288 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1134 

A DNA sequence (GBSxl210) was identified in S.agalactiae <SEQ ID 3519> which encodes the amino 
acid sequence <SEQ ID 3520>. Analysis of this protein sequence reveals the following: 

Possible site: 18 
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>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.43 Transmembrane 7 - 23 ( 7 - 23) 



Final Results 

5 bacterial membrane Certainty=0. 1171 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

10 >GP:AAA26954 GB-.J04479 DNA polymerase I [Streptococcus pneumoniae] 

Identities = 655/879 (74%) , Positives = 748/879 (84%) , Gaps = 4/879 (0%) 

Query: 3 NKNKLLLIDGSSVAFRAFFALYNQIDRFKNNSGLHTNAIYGFHLMLNHILGRVQPSHILV 62 
+K KLLLIDGSSVAFRAFFALY Q+DRFKN +GLHTNAIYGF LML+H+L RV+PSHILV 
15 Sbjct: 2 DKKKLLLIDGSSVAFRAFFALYQQLDRFKNAAGLHTNAIYGFQLMLSHLLERVEPSHILV 61 

Query: 63 AFDAGKTTFRTEMYADYKGGRAKTPDEFREQFPYIRQQLDVLGIKHYELEHYEADDIIGT 122 

AFDAGKTTFRTEMYADYKGGRAKTPDEFREQFP+IR+ LD +GI+HYEL YEADDIIGT 
Sbjct: 62 AFDAGKTTFRTEMYADYKGGRAKTPDEFREQFPFIRELLDHMGIRHYELAQYEADDIIGT 121 

20 

Query: 123 LAKOAEASNEHFDITWSGDKDLIQLTDTNTVVEISKKGVAEFEEFTPAYLMEKMGITPS 182 

L K AE + FDIT+VSGDKDLIQLTD +TWEISKKGVAEFE FTP YLME+MG+TP+ 
Sbjct: 122 LDKLAE- -QDGFDITIVSGDKDLIQLTDEHTWEISKKGVAEFEAFTPDYLMEEMGLTPA 179 

25 Query: 183 QFIDLKALMGDKSDNIPGVTKIGEKTGLKLLSEYGSLEGIYENIEAMKQSKMKENLINDK 242 

QFIDLKALMGDKSDNIPGVTK+GEKTG+KLL E+GSLEGI YENI + MK SKMKENLINDK 
Sbjct: 180 QFIDLKALMGDKSDNIPGVTKVGEKTGIKLLLEHGSLEGIYENIDGMKTSKMKKNLINDK 239 

Query. 243 EQAFLSKTLATINIASPITIGLEDILYSGPQDIKALSQFYDEMDFKQFKAALGEETSQED 302 
30 EQAFLSKTLATI+ +PI IGLED++YSGP D++ L +FYDEM FKQ K AL ++ 

Sbjct: 240 EQAFLSKTLATIDTKAPIAIGLEDLVYSGP-DVENLGKFYDEMGFKQLKQALNMSSADVA 298 

Query: 303 FEVDFTEVEQLKTEMFSDNDFYYFEMLGDNYHVEDLIGIAWGNSDTIYATSNVSLLQEAL 362 
+DFT V+Q+ +M S+ ++FE+ G+NYH ++L+G AW D +YAT + LLQ+ + 
35 Sbjct: 299 EGLDFTIVDQISQDMLSEESIFHFELFGENYHTDNLVGFAWSCGDQLYATDKLELLQDPI 358 

Query: 363 FKKALSKP-IKTYDFKRSKVLLNRFNIDLPEPAFDTRLAKYLLSTTEDNLVSTIARLYTN 421 

FK L K ++ YDFK+ KVLL RF +DL PAFD RLAKYLLST EDN ++TIA LY 
Sbjct: 359 FKDFLEKTSLRVYDFKKVKVLLQRFGVDLQAPAFDIRLAKYLLSTVEDNEIATIASLYGQ 418 

40 

Query: 422 LPLDTDDAWGKGAKRAIPEKTRFLEHIAKCTKVLVDSEANIMQQLKANEQEELLFEMEQ 481 

L D+ YGKG K+AIPE+ +FLEHLA K+ VLV++E ++++L N Q ELL++MEQ 
Sbjct: 419 TYLVDDETFYGKGVKKAIPEREKFLEHLACKLAVLVETEPILLEKLSENGQLELLYDMEQ 478 

45 Query: 482 PLANVIiAK^IRGIKVKKNTLNE^IENQKVIETLTQEIYEIAGQEFNINSPKQLGKLLF 541 

PLA VLAKMEI GI VKK TL EM EN+ VIE LTQEIYELAG+EFN+NSPKQLG LLF 
Sbjct: 479 PLAFVIAKMEIAGIVVKKETLLEMQAENELVIEKLTQEIYELAGEEFIWNSPKQLGVLLF 538 

Query: 542 ETLGLPVEMTKKTKTGYSTAVDVLERLAPISPLVTKILEYRQITKLQSTYIIGLQDYILE 601 
50 E LGLP+E TKKTKTGYSTAVDVLERLAPI+P+V KIL+YRQI K+QSTY+IGLQD+IL 

Sbjct: 539 EKLGLPLEYTKKTKTGYSTAVDVLERLAPIAPIVKKILDYRQIAKIQSTYVIGLQDWILA 598 

Query: 602 DGKIHTRYVQDLTQTGRLSSSDPNLQNIPVRLEQGRLIRKAFVPSEDNAVLLSSDYSQIE 661 
DGKIHTRYVQDLTQTGRLSS DPNLQNIP RLEQGRLIRKAFVP +++VLLSSDYSQIE 
55 Sbjct: 599 DGKIHTRYVQDLTQTGRLSSVDPNLQNIPARLEQGRLIRKAFVPEWEDSVLLSSDYSQIE 658 

Query: 662 LRVLAHISKDEHLIAAFKEGADIHTSTAMRVFGIEKPENVTPNDRRNAKAVNFGIVYGIS 721 

LRVLAHI SKDEHLI AF+EGADIHTSTAMRVFGIE+P+NVT NDRRNAKAVNFG+ VYGI S 
Sbjct: 659 LRVLAHISKDEHLIKAFQEGADIHTSTAiVIRWGIERPDNVTANDRRNAKAVNFGVVYGIS 718 

60 

Query: 722 DFGLSHNLGIPRKLAKQYIDTYFERYPGIKNYMETVVREAKDKGYVETLFHRRRSLPDIN 781 

DFGLS+NLGI RK AK YIDTYFER+PGIKNYM+ WREA+DKGYVETLF RRR LPDIN 
Sbjct: 719 DFGLSNNLGISRKEAKAYIDTYFERFPGIKNYMDEVVREARDKGYVETLFKRRRELPDIN 778 

65 Query: 782 SRNFNIRQFAERTAINSPIQGSAADILKIAMINLDRvLDKGGYKSKMLLQVHDEIVLEVP 841 

SRNFNIR FAE TAINSPIQGSAADILKIAMI LD+ L GGY+ +KMLLQVHDEI VLEVP 
Sbjct: 779 SRNFNIRGFAEATAINSPIQGSAADILKIAMIQLDKALVAGGYQTKMLLQVHDEIVLEVP 838 
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Query: 842 NEEIGAIRELVTKTMESAISLSVPLIADENAGETWYEAK 880 

E+ +++LV +TME AI LSVPLIADEN G TWYEAK 
Sbjct: 839 KSELVEMKKLVKQTMEEAIQLSVPLIADENEGATWYEAK 877 

5 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3521> which encodes the amino acid 
sequence <SEQ ID 3522>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

>» Seems to have no N-terminal signal sequence 
10 INTEGRAL Likelihood = -0.43 Transmembrane 7 - 23 ( 7 - 23) 

Final Results 

bacterial membrane Certainty=0 . 1171 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 665/881 (75%) , Positives = 761/881 (85%) , Gaps = 2/881 (0%) 

20 Query: 1 MTNKNKLLL IDGS S VAFRAFFALYNQI DRFKNNSGLHTNAI YGFHLMLNH I LGRVQPSHI 60 

M NKNKLLLIDGSSVAFRAFFALYNQIDRFKN+SGLHTNAIYGFHLML+H++ RVQP+H+ 
Sbjct: 1 MENKNKLLLIDGSSVAFRAFFALYNQIDRFKNHSGLHTNAIYGFHLMLDHMMKRVQPTHV 60 

Query: 61 LVAFDAGKTTFRTEMYADYKGGRAKTPDEFREQFPYIRQQLDVLGIKHYELEHYEADDII 120 
25 LVAFDAGKTTFRTEMYADYK GRAKTP+EFREQFPYIR+ L LGI + YELEHYEADD 1 1 

Sbjct: 61 LVAFDAGKTTFRTEMYADYKAGRAKTPEEFREQFPYIREMLTALGIAYYELEHYEADDII 120 

Query: 121 GTLAKQftEASNEHFDITWSGDKDLIQLTDTNTVvEISKKiGvJVEFEEFTPAYLMEKMGIT 180 
GTL K AE + FD+T+VSGDKDLIQLTD NTWEISKKGvAEFEEFTPAYLMEKMG+T 
30 Sbjct: 121 GTLDKMAERTEVPFDOTIVSGDKDLIQLTDENTVVEISKKGVAEFEEFTPAYLMEKMGLT 180 

Query: 181 PSQFIDLKALMGDKSDNIPGVTKIGEKTGLKLLSEYGSLEGIYENIEAMKQSKMKENLIN 240 

P+QFIDLKALMGDKSDNIPGVTKIGEKTGLKLL E+GSLEGIYE+I+ K SKMKENLIN 
Sbjct: 181 PNQFIDLKALMGDKSDNIPGVTKIGEKTGLKLLHEFGSLEGIYEHIDGFKTSKMKENLIN 240 

35 

Query: 241 DKEQAFLSKTLAT INI AS PI T IGLED I LYSGPQDI KALSQFYDEMDFKQFKAALGEETSQ 300 

D++QAFLSKTLATIN ASPITIGL+DI+Y+GP D+ +LSQFYDEMDF Q K L + Q 
Sbjct: 241 DRDQAFLSKTLATINTASPITIGLDDI VYNGP-DVASLSQFYDEMDFVQLKKGLASQMPQ 299 

40 Query: 301 EDFEV-DFTEVEQLKTEMFSDNDFYYFEMLGDNYHVEDLIGIAWGNSDTIYATSNVSLLQ 359 

E V + EV + ++FS D +YFE L DNYH E +IG AWG+ + IYA++++ LL 
Sbjct: 300 EPVAVISYQEVTNVSADLFSAEDIFYFETLRDNYHREAIIGFAWGHGEQIYASTDLGLLA 359 

Query: 360 EALFKKALSKPIKTYDFKRSKOTjIjNRFNIDLPEPAFDTRLAKYLLSTTEDNLVSTIARLY 419 
45 FK+ KPI TYDFKRSKVLL+ I+L P++D RLA YLLST EDN +STIAR++ 

Sbjct: 360 TDSFKQVFQKPIATYDFKRSKVLLSHLGIELVAPSYDARLANYLLSTVEDNELSTIARIF 419 

Query: 420 TNLPLDTDDAVYGKGAKRAIPEKTRFLEHLAKXVKVLVDSEANIMQQLKANEQEELLFEM 479 
T++ L+ DD VYGKGAKRA+P+K LEHLA+KVKVL+DS++ ++ +L A+EQ +L + 
50 Sbjct: 420 TDISLEEDDTVYGKGAKRAVPDKDVLLEHLARKVKVLLDSKSQMLDKLTAHEQLDLYQNI 479 

Query: 480 EQPLANVIAKMEIRGIKVTCKNTIOTMAIENQKVIETLTQEIYEIAGQEFNINSPKQLG^ 539 

E PLANVLAKMEI GIKV + TL +MA +N+ +IE LTQEIY++AGQEFNINSPKQLG + 
Sbjct: 480 ELPIAimAKMEIEGIKVNRATLQDMAEQNKVIIEALTQEIYDMAGQEFNINSPKQLGSI 539 

55 

Query: 540 LFETLGLPvEMTKKTKTGYSTAVDVLERLAPISPLVTKILEYRQITKLQSTYIIGLQDYI 599 

LFE + LP+EMTKKTKTGYSTAV+VLERLAPI+P+V KIL+YRQITKLQSTY+IGLQDYI 
Sbjct: 540 LFEKMQLPLEMTKKTKTGYSTAVNVLERLAPIAPIVAKILDYRQITKLQSTYVIGLQDYI 599 

60 Query: 600 LEDGKIHTRYVQDLTQTGRLSSSDPNLQNIPVRLEQGRLIRKAFVPSEDNAVLLSSDYSQ 659 

L DGKIHTRYVQDLTQTGRLSS DPNLQNIP+RLEQGRLIRKAF PS ++AVLLSSDYSQ 
Sbjct: 600 LADGKIHTRYVQDLTQTGRLSSVDPNLQNIPIRLEQGRLIRKAFTPSHEDAVLLSSDYSQ 659 



65 



Query: 



660 IELRvLAHISKDEHLIAAFKEGADIHTSTAMRVFGIEKPENVTPNDRRNAKAVNFGIVYG 719 
IELRVLAHIS DEHLIAAF EGADIHTSTAMRVFGI++ +VT NDRRNAKAVNFGIVYG 
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Sbjct: 660 IELRVLAHISGDEHLIAAFNEGADIHTSTAMRVFGIDRAADVTANDRRNAKAVNFGIVYG 719 

Query: 720 ISDFGLSHNLGIPRKLAKQyiDTYFERYPGIKNYMETVVREAKDKGYVETLFHRRRSLPD 779 

ISDFGLS+NLGI RK AK YIDTYFERYPGIK YME WREAKDKGYVETLF RRR LPD 
Sbjct: 720 ISDFGLSNNLGITRKQAKSYIDTYFERYPGIKAYMENVVREAKDKGYVETLFKRRRELPD 779 

Query: 780 INSRNFNIRQFAERTAINSPIQGSAADILKIAMINLDRVLDKGGYKSKMLLQVHDEIVLE 839 

INSRNFN+R FAERTAINSP I QGSAAD ILKIAMINLD+ L GG+++KMLLQVHDE1VLE 
Sbjct: 780 INSRNFNTOSFAERTAINSPIQGSAADILKIAMINLDKALQAGGFRAKMLLQVHDEIVLE 839 

Query: 840 VPNEEIGAIRELVTKTMESAISLSVPLIADENAGETWYEAK 880 

VPN+E+ AI++LV TME+A+ L+VPL DE+ G +WYEAK 
Sbjct: 840 VPNDELTAIKKLVKDTMEAAVDIAVPLCVDESTGHSWYEAK 880 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1135 

A DNA sequence (GBSxl211) was identified in S.agalactiae <SEQ ID 3523> which encodes the amino 
acid sequence <SEQ ID 3524>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1880 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9571> which encodes amino acid sequence <SEQ ID 9572> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05860 GB:AP001514 unknown conserved protein [Bacillus halodurans] 
Identities = 72/134 (53%) , Positives = 94/134 (69%) , Gaps = 3/134 (2%) 

Query: 17 NPSDFMLKNYLTKAKTIAWGLSDRQETAAYQVSKIMQEAGYQI I PVNPKNAGQKILGQM 76 

NPSD +K L +AK IAWGLS + +Y VS MQ AGY+IIPVNP ++LG+ 
Sbjct: 4 NPSDEKI KQI LQEAKRIAWGLSGNPDRTSYMVSAAMQHAGYE 1 1 P VNP - - TVDEVLGEK 61 

Query: 77 TYASLKDVTEHIDIVNIFRRSEYLPDIAREFLEVDADIFWAQLGLESQEAETILKQAGHK 136 

SL+D+ +DIVN+FRRSE+LPD+ARE +E+ A +FWAQLGLE++EA L+Q G 
Sbjct: 62 AVPSLQDIEGAVDIVNVFRRSEHLPDVARETVEIGAPVFWAQLGLENKEAYDYLQQHGVT 121 

Query: 137 QIVMNKCLKVECQK 150 

I MN+C+KVE K 
Sbjct: 122 SI-MNRCIKVEHAK 134 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3525> which encodes the amino acid 
sequence <SEQ ID 3526>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0837 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 87/141 (61%) , Positives = 114/141 (80%) 
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Query: 11 MVYHFQNPSDFMLKNYLTKAKTIAWGLSDRQETAAYQVSKIMQEAGYQIIPVNPKNAGQ 70 

++Y FQNPS+ +LK YL AKTIAWGLSDR++TAAY V+K MQ Y+IIPVNPK AGQ 
Sbjct: 1 VIYSFQNPSEDVLKAYLESAKTIAWGLSDRKDTAAYGVAKFMQAMDYRIIPVNPKLAGQ 60 

5 

Query: 71 KILGQMTYASLKDVTEHIDIVNIFRRSEYLPDIAREFLEVDADIFWAQLGLESQEAETIL 130 

ILG+ YAS+K + +DIV++FRRSE+LP++AR+FL A +FWAQLGLE+QEA+TTL 
Sbjct: 61 LILGEKVYASIKAIPFEVDIVDVFRRSEFLPEVARDFIiAGQAKVFWAQLGLENQEAQTIL 120 

10 Query: 131 KQAGHKQIVMNKCLKVECQKL 151 

+ AG + IVMN+CLK++ +L 
Sbjct: 121 RSAGKEAIVMNRCLKIDYLQL 141 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
15 vaccines or diagnostics. 

Example 1136 

A DNA sequence (GBSxl212) was identified in S.agalactiae <SEQ ID 3527> which encodes the amino 
acid sequence <SEQ ID 3528>. Analysis of this protein sequence reveals the following: 

Possible site: 13 
20 >>> Seems to have no N-terrainal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3367 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9573> which encodes amino acid sequence <SEQ ID 9574> 
was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3529> which encodes the amino acid 
30 sequence <SEQ ID 3530>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>>> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0 .4960 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

40 Identities = 113/151 (74%), Positives = 133/151 (87%), Gaps = 1/151 (0%) 

Query: 7 MDSHSHGHRPIiDAYENVLEHLREKRIRITETRKAIISYMVNSREHPSAEKIYNDLLPEYP 66 

MD HSH + LDAYENVLEHLREK IRITETRKAI ISYM+ S EHPSA+KIY DL P +P 
Sbjct: 1 MDIHSH-QQALDAYENVLEHLREKHIRITETRKAIISYMIQSTEHPSADKIYRDLQPNFP 59 



45 



Query: 67 NMS1ATVYNNLKVLVDEGFVTELKLCNYSTTYYDFMGHQHLNIACEDCGKI VDFVD 126 

NMSLATVYNNLKVLVDEGFV+ELK+ N TTYYDFMGHQH+N+ CE CGKI DF+DVD++ 
Sbjct: 60 NMSIATVYNNLKVLVDEGFVSELKISNDLTTYTO 119 



50 Query: 127 DISREAHQQTGFEVTRVQLVAYGICPECQRK 157 

DI++EAH+QTG++VTR+ ++AYGICP+CQ K 
Sbjct: 120 DIAKEAHEQTGYKVTRI PVIAYGI CPDCQAK 150 



55 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1137 

A DNA sequence (GBSxl213) was identified in S.agalactiae <SEQ ID 3531> which encodes the amino 
acid sequence <SEQ ID 3532>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.13 Transmembrane 16 - 32 ( 14 - 32) 
INTEGRAL Likelihood = -1.81 Transmembrane 496 - 512 ( 496 - 515) 



Final Results 

bacterial membrane Certainty=0 . 1850 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA06650 GB:AJ005645 sdrc [Staphylococcus aureus] 
Identities = 41/146 (28%) , Positives = 63/146 (43%) , Gaps = 13/146 (8%) 

Query: 4 SQYNKWSIRRLKVGAASVMIASGSIVALGQSHIVSAD EMSQPKTTITAPTANTSTN 59 

++ NK+SIR+ VG AS+++ + I L +A+ E++Q K TAP+ N +T 

Sbjct: 16 NRLNKFSIRKYSVGTASILVGTTLIFGLSGHEAKAAEHTNGELNQSKNETTAPSENKTT- 74 

Query: 60 VES STDKALS KVTTMETSSEMPK- - MQNMAKVEKTSDKPMMVATSVRKMMATPTPVAMT - 116 

D K T +++ PK M + A V++TS + T T T 

Sbjct: 75 - - KKVDSRQLKDNTQTATADQPKVTMSDSATVKETSSNMQSPQNATANQSTTKTSNVTTN 132 

Query: 1" KTTSVDEVKKSTDTAFKQTVD VP 139 

TT +E KS T K P 
Sbjct: 133 DKSSTTYSNETDKSNLTQAKDVSTTP 158 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8735> and protein <SEQ ID 8736> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 

McG: Discrim Score: -0.92 

GvH: Signal Score (-7.5) : -2.48 
Possible site: 39 

>>> Seems to have no N-terminal signal sequence 

ALOM program count: 2 value: -2.13 threshold: 0.0 

INTEGRAL Likelihood = -2.13 Transmembrane 16 - 32 ( 14 - 32) 
INTEGRAL Likelihood = -1.81 Transmembrane 496 - 512 ( 496 - 515) 
PERIPHERAL Likelihood = 7.96 402 
modified ALOM score: 0.93 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 . 1850 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

LPXTG motif: 485-489 

The protein has homology with the following sequences in the databases: 

D|598l|5780 leukotoxin > Insert characterized 

SP|P16462|HLYA_ACTAC LEUKOTOXIN. > Edit characterized 

GP| 141834 |gb|AAA21922.1 | |M27399 leukotoxin (LtA) {Actinobacillus actinomycetemcomitans} 
Insert characterized 



Query: 210 VSLNGNTTGKEGQALLDQI | AND KHSYQATIRVYGAKDGKVDLKNMISPKMVTINIP 266 

++ NG+ + G+A +D +K + KHS + T ++ G +DL + +T P 



10 



WO 02/34771 PCT/GB01/04789 

-1273- 

Sbjct: 488 ITRNGDRI-QSGKAYVDYLKKGEELAKHSDKFTKQILDPIKGNIDLSGIKGSTTLTFLJSrP 546 



+T E+++ E+++K+KKGP GV+ + A 



ILKASEGAKWSDNGVDKNSPLL PLKDLTKGKYFYQVSLNGNTAGKKGQALLD 376 

L A+ GAK V S ++ + D +KG+ ++++G A K GQ ++ 



Sbjct: 


488 


Query: 


267 


Sbjct: 


547 


Query: 


323 


Sbjct: 


607 


Query: 


377 


Sbjct: 


666 



+G+ Q T++ TK GKV 



15 SEQ ID 3532 (GBS1) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 1 (lane 3; MW 78kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 2 (lane 3; MW 53kDa). 

The His-fusion protein was purified as shown in Figure 189, lane 5. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
20 vaccines or diagnostics. 

Example 1138 

A DNA sequence (GBSxl214) was identified in S.agalactiae <SEQ ID 3533> which encodes the amino 
acid sequence <SEQ ID 3534>. This protein is predicted to be response regulator (regX3). Analysis of this 
protein sequence reveals the following: 

25 Possible site: 32 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3585 (Affirmative) < suco 

30 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB54578 GB:AJ006397 response regulator [Streptococcus pneumoniae] 
35 Identities = 143/228 (62%) , Positives = 183/228 (79%) , Gaps = 1/228 (0%) 

Query: 1 MTQKLiLLVDDEFEIIDINRRYLEQAGyEVSVAADGIEALKEVDENRFDLIISDIMMPKMD 60 

M + +LLVDDE EI DI++RYL QAGY+V VA DG+EAL+ + DLII+D+MMP+MD 
Sbjct: 1 MGK^IDLVDDEVEITDIHQRYLIQAGYQVLVAHDGLEALELFKKKPIDLIITDVMMPRMD 60 

40 

Query: 61 GYDFISEVLVREPNQPFLFITAKVSEPDKIYSLSMGADDFISKPFSPRELVLRVKNILRR 120 

GYD ISEV P QPFLFITAK SE DKIY LS+GADDFI+KPFSPRELVLRV NILRR 
Sbjct: 61 GYDLISEVQYLSPEQPFLFITAKTSEQDKIYGLSLGADDFIAKPFSPRELVLRVHNILRR 120 

45 Query: 121 IYGIfflQQSEVLTIGDLVIDQKQRLvMVDfOTISLTOKBFDLLWILANHIiNRVFSKTELYE 180 

++ ++E++++G+L ++ V + + LT KSF+LLWILA++ RVFSKT+LYE 

Sbjct: 121 LH-RGGETELISLGNLKMNHSSHEVQIGEEMLDLTVKSFELLWILASNPERVFSKTDLYE 179 

Query: 181 RWGEEFLDDTOTLNVHIHALR1TOLAKFSTDNTPTIKTVWGLGYKLEE 228 
50 ++W E+++DDTNTLNVHIHALR +IAK+S+D TPTIKTVWGLGYK+E+ 

Sbjct: 180 KIWKEDYVDDTNTLNVHIHALRQELAKYSSDQTPTIKTVWGLGYKIEK 227 

There is also homology to SEQ ID 1 182. 



55 



Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1139 

A DNA sequence (GBSxl215) was identified in S.agalactiae <SEQ ID 3535> which encodes the amino 
acid sequence <SEQ ID 3536>. This protein is predicted to be histidine kinase (resE). Analysis of this 
protein sequence reveals the following: 

5 Possible site: 25 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -9.13 Transmembrane 42 - 58 ( 33 - 65) 
INTEGRAL Likelihood = -7.54 Transmembrane 7 - 23 ( 3 - 29) 

10 Final Results 

bacterial membrane Certainty=0. 4 6 52 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB54579 GB:AJ006397 histidine kinase [Streptococcus pneumoniae] 
Identities = 190/343 (55%) , Positives = 249/343 (72%) 

MKLKYYIVIGYLISMLITVAGVFFGLNHMLIETRGVYYILSVTIIACIVGGIVNLFLLSS 60 
20 MKLK YI++GY+IS L+T+ VF+ + MLI +Y++L +TI+A +VG ++LFLL 

MKLKSYILVGYIISTLLTILWFWAVQKMLIAKGEIYFLLGMTIVASLVGAGISLFLLLP 60 

VFTSLKKLKQKMKDISQRCFDTKAQICSPQEFKDLETAFNQMSSELESTFKSLNESEREK 12 0 
VFTSL KLK+ K ++ + F + ++ P EF+ L FN+MS +L+ +F SL ESEREK 
25 Sbjct: 61 VFTSLGKLKEHAKRVAAKDFPSNLEVQGPVEFQQLGQTFNEMSHDLQVSFDSLEESEREK 120 

r^IAQLSHDIKTPITSIQSTVEGILDGIISEEEVNYYLNTISRQTNRIiNHLVEELSFIT 180 
+MIAQLSHDIKTPITSIQ+TVEGILDGII E E +YL TI RQT RLN LVEEL+F+T 



30 



35 



Query: 


1 


Sb j ct : 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sb j ct : 


121 


Query: 


181 


Sb j ct : 


181 


Query: 


241 


Sb j ct : 


241 


Query: 


301 


Sb j ct : 


301 



L T + E +++I+LDKLLI+ +SEFQ + E+E R V + V P+ +++ Y KLSR 



IL+NL+ NA KYS PG+ L + A + + I + D+G GI EDL +IF RLYRVE+SR 



40 NMKTGGHGLGL IAR+LAHQL G+I V SQY GS F+LVL L 

Sbjct: 301 NMKTGGHGLGLAIARELAHQLGGEITVSSQYGLGSTFTLVLNL 343 

There is also homology to SEQ ID 1178. 

A related GBS gene <SEQ ID 8737> and protein <SEQ ID 8738> were also identified. Analysis of this 
45 protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 3 
McG: Discrim Score: 8.67 
GvH: Signal Score (-7.5): -5.75 
Possible site: 25 
50 >» Seems to have an uncleavable N-term signal seq 

ALOM program count: 2 value: -9.13 threshold: 0.0 

INTEGRAL Likelihood = -9.13 Transmembrane 42 - 58 ( 33 - 65) 
INTEGRAL Likelihood = -7.54 Transmembrane 7 - 23 ( 3 - 29) 
PERIPHERAL Likelihood = 3.92 196 
55 modified ALOM score: 2.33 

*** Reasoning Step: 3 



60 



Final Results 

bacterial membrane 
bacterial outside 
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bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

55.3/72.7% over 343aa 

5 Streptococcus 
pneumoniae 

GP| 5830539 | histidine kinase Insert characterized 

ORF00129(301 - 1332 of 1635) 
10 GP|5830539|emb|CAB54579.l| |AJ006397(1 - 344 of 350) histidine kinase {Streptococcus 

pneumoniae } 
%Match =34.0 

%Identity =55.2 %Similarity =72.7 

Matches = 190 Mismatches = 94 Conservative Sub.s = 60 

15 

42 72 102 132 162 192 222 252 

VIWLSTKNMW*WWTAIQFP*PINHLTCFGY*QII*IWFQKQSFMWSGAKNF*MTLIL*MFISMPYAMTLLNLVQTIP 

282 312 342 372 402 432 462 492 

20 QLSKQFGD*GIN*RNKMKLKYYIVIGYLISMLITVAGVFFGLNHMLIETRGVYYILSWIIACIVGGIVNLFLLSSVFTS 

llll ll==ll=ll hh 11= : III = |: = l =lhl =11 = = 1111 Mil 

MKLKSYILVGYIISTLLTILWFWAVQKMLIAKGEIYFLLGMTIVASLVGAGISLFLLLPVFTS 
10 20 30 40 50 60 

25 522 552 582 612 642 672 702 732 

LKKLKQKMKDISQRCFDTKAQICSPQEFKDLETAFNQMSSELESTFKSLNESEREKTMMIAQLSHDIKTPITSIQSTVEG 

i in= i == = i = == i n= i 11 = 11 =i= =i ii nun =iniiiiimnnn = mi 

LGKLKEHAKRVAAKDFPSNLEVQGPVEFQQLGQTFNEMSHDLQVSET3SLEESEREKGLMIAQLSHDIKTPITSIQATVEG 
80 90 100 110 120 130 140 

30 

762 792 822 852 882 912 942 972 

ILDGIISEEEVNYYLNTISRQTNRLNHLWELSFITLETMSDTAEPHKEETIYLDKIiLIDILSEFQLVFEKENRQVMIDV 

llllll I I =11 II III III 11111=1=11 I = I ===|:||||||= =1111=: |:| | | : | 
ILDGIIKESEQAHYIATIGRQTERLNKLVEEIiNFLTLNTARNQvETTSKDSIFLDKLLIECMSEFQFLIEQERRDVHLQ 
35 160 170 180 190 200 210 220 

1002 1032 1062 1092 1122 1152 1182 1212 

APDVSKLSSQYDKLSRILIjNLISNAXKYSDPGSPLTIKAYSNRQDIVIDIIDQGYGIKDEDIASIFNRLYRVESSRlSMKT 

1= = = = I 111111 = 11= II III 11= I = I = = I = 1 = 1 II III =11 Ml -Mill 

40 I PESARIEGDYAKLSRILVNLVDNAFKYSAPGTKLEWAKLEKDQLS I S VTDEGQGIAPEDLENI FKRLYRVETSRNMKT 

240 250 260 270 280 290 300 

1242 1272 1302 1332 1362 1392 1422 1452 

GGHGLGLYIARQLAHQLNGDILVESQYQKGSKFSLVLKLQK*LGIIPSYFL*CFYKRLSAQ*FGKEGDRYRLIRN*RL*G 

45 mini nmi iii 1 = 1 mi n i = m i 

GGHGLGLAIARELAHQLGGEITVSSQYGLGSTFTLVLNLSGSENKA 
320 330 340 350 

SEQ ID 8738 (GBS28) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
50 extract is shown in Figure 14 (lane 3; MW 64kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 85 (lane 5; MW 38.8kDa) and in Figure 157 
(lane 9-11; MW39kDa). 

GBS28-His was purified as shown in Figure 221, lane 6-7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
55 vaccines or diagnostics. 

Example 1140 

A DNA sequence (GBSxl216) was identified in S.agalactiae <SEQ ID 3537> which encodes the amino 
acid sequence <SEQ ID 3538>. Analysis of this protein sequence reveals the following: 
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Possible site: 19 

>>> Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood = -7.70 Transmembrane 
INTEGRAL Likelihood = -7.59 Transmembrane 
INTEGRAL Likelihood = -6.48 Transmembrane 
INTEGRAL Likelihood = -5.57 Transmembrane 
INTEGRAL Likelihood = -1.33 Transmembrane 



125 - 


141 


( 110 - 


155) 


38 - 


54 


( 36 - 


56) 


146 - 


162 


( 143 - 


174) 


72 - 


88 


{ 63 - 


93) 


229 - 


245 


{ 227 - 


245) 



Final Results 

bacterial membrane Certainty=0 .4079 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9575> which encodes amino acid sequence <SEQ ID 9576> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA79984 GB:Z21972 ORF1 [Bacillus megaterium] 
Identities = 35/119 (29%) , Positives = 62/119 (51%) , Gaps = 15/119 (12%) 

Query: 142 SSFRLLLSGNLILAPVLIWSSLITTKAVIKLV QQYYSYSISTLVFYTQLESGNYEG 198 

+SF+L+ +++ A + + S L+ +IK + QQ++ + YT LE+ 
Sbjct: 105 TSFKLI -GASILQAIFIFLWSLLLIIPGIIKAIAYSQQFFL- -LKDHPEYTVLEA 156 

Query: 199 PSKVLVASRELMNGNKLRLFLLDLSFIGWQFLTIFSFGLVYIYLLPYQTTARLIFYRNI 257 

+ S++ M G K + FL+ LSFIGW L +F+ G+ ++L+PY T FY + 
Sbjct: 157 ITESKKRMKGLKWKYFLMHLSFIGWGILCMFTLGIGLLWLIPYAGTTTAAFYEEL 211 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3539> which encodes the amino acid 

sequence <SEQ ID 3540>. Analysis of this protein sequence reveals the following: 

Possible site: 54 
>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-10.08 Transmembrane 148 - 164 ( 143 - 170) 

INTEGRAL Likelihood = -8.28 Transmembrane 114 - 13 0 ( 101 - 141) 

INTEGRAL Likelihood = -6.69 Transmembrane 60 - 76 ( 49 - 82) 

INTEGRAL Likelihood = -3.72 Transmembrane 21 - 37 ( 21 - 39) 

INTEGRAL Likelihood = -2.34 Transmembrane 222 - 238 ( 221 - 239) 



Final Results 

bacterial membrane Certainty=0 . 5034 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA79984 GB:Z21972 0RF1 [Bacillus megaterium] 
Identities = 63/220 (28%) , Positives = 100/220 (44%) , Gaps = 31/220 (14%) 

Query: 62 LGLILSLFILSASFTMI-DWRHFRQKVSFAESTTAFSKEFFGNLLVLAITKWLFFLIWS 120 

+ L+L LF+++ F +1 +V+ + T + F + +A+ L S 

Sbjct: 22 VSLMLLLFLINLVFPLIVEVIGSGGFSEWLMQEETPLWSDIFSMVFSIALIP LTIS 77 

Query: 121 LIWFF GLF I FLSGLSAFLVNAKSGSSTVI SL I FLLFGA VLSLIGFGI 167 

WF+ 1+ G ++F + G+S + ++ L+ +L + G 
Sbjct: 78 TTWFYLNLVREGNPGIPEVFAIYKDGKTSFKL IGASILQAIFIFLWSLLLIIPG 131 

Query: 168 YINRYYAYSLSEYLLYDEVKEGTYLGAIAVIETSVAMMKGYKWKLFFLQLSFTGWFLLNI 227 

I + AYS +LL D E T L AI S MKG KWK F + LSF GW +L + 

Sbjct: 132 - IIKAIAYSQQFFLLKDH- PEYTVLEAIT ESKKRMKGLKWKYFLMHLSFIGWGILCM 186 

Query: 228 VTFGLLNIYLLPYFTTANVIFYDQLKKRFKDKDD- -PIEG 265 

T G+ ++L+PY T FY++L +D DD IEG 
Sbjct: 187 FTLGIGLLWLIPYAGTTTAAFYEELIVPQEDIDDDQQIEG 226 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 87/254 (34%) , Positives = 137/254 (53%) , Gaps = 10/254 (3%) 

Query: 16 MTNSEIKNEAKTILSNLQGKNQLFLLPILLSIITLYISFYYQYN NMTLLDFFVPL 70 

5 M+ IK +A+ L NL GK LFL+P LL + I + Y ++L + PL 

Sbjct: 1 MSIKAIKGQARDTLKNLSGKYLLFLIPTLLFMFHFGIEIHQGYVLSSGIEVSLAASYFPL 60 

Query: 71 PWFFYTLFIISVSFVMLDWKNQKLJ^FSDmYWSSHIFWKIjLSvLVLKGLILSFFY 130 
+ +LFI+S SF M+DW++ + V F+++T FS _ F LL + + K L + 
10 Sbjct: 61 LLGLILSLFILSASFTMIDVVRHFRQKVSFJffiSTTAFSKEFFGNLLVLAITKWLFFLIWS 120 

Query: 131 LLSTFGLLIIISSFRLLL SGNLILAPVLIVVSSLITTKAVIKLVQQYYSYSISTL 185 

L+ FGL I +S L + +++ + ++ ++++ + +YY+YS+S 

Sbjct: 121 LIWFFGLFIFLSGLSAFLWAKSGSSTVISLIFLLFGAVLSLIGFGIYINRYYAYSLSEY 180 

15 

Query: 186 VFYTQLESGNYEGPSKVLVASRELMNGNKLRLFLLDLSFIGWQFLTIFSFGLVYIYLLPY 245 

+ Y +++ G Y G V+ S +M G K +LF L LSF GW L I +FGL+ IYLLPY 
Sbjct: 181 LLYDEVKEGTYLGAIAVIETSVAraKGYKWKLFFLQLSFTGWFLLNIVTFGLLNIYLLPY 240 

20 Query: 246 QTTARLI FYRNITK 259 

TTA +IFY + K 
Sbjct: 241 FTTANVT FYDQLKK 254 

A related GBS gene <SEQ ID 8739> and protein <SEQ ID 8740> were also identified. Analysis of this 
25 protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 4 
McG: Discrim Score: -11.32 
GvH: Signal Score (-7.5): -5.39 
Possible site: 19 
30 >>> Seems to have no N- terminal signal sequence 

ALOM program count: 5 value: -7.70 threshold: 0.0 

INTEGRAL Likelihood = -7.70 Transmembrane 125 - 141 ( 110 - 155) 
INTEGRAL Likelihood = -7.59 Transmembrane 38 - 54 ( 34 - 56) 
INTEGRAL Likelihood = -6.48 Transmembrane 146 - 162 ( 143 - 174) 
35 INTEGRAL Likelihood = -5.57 Transmembrane 72 - 88 ( 63 - 93) 

INTEGRAL Likelihood = -1.33 Transmembrane 229 - 245 ( 227 - 245) 
PERIPHERAL Likelihood = 0.37 105 
modified ALOM score: 2.04 

40 *** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 4079 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



55 



The protein has homology with the following sequences in the databases: 



ORF00498(901 - 1071 of 1383) 

EGAD | 19922 | 20421 (155 - 211 of 226) hypothetical protein {Bacillus megaterium} 
50 GP|288299|emb|CAA79984.l| |Z21972 ORF1 {Bacillus megaterium} PIR| S32215 | S32215 hypothetical 

protein 1 - Bacillus megaterium 
%Match =4.8 

%Identity = 36.8 %Similarity = 61.4 

Matches = 21 Mismatches = 22 Conservative Sub.s = 14 



741 771 801 831 861 891 921 951 

LIIISSFRLLLSGNLIIAPVLIWSSLITTKAVIKLVQQYYSYSISTLWYTQLESGNYEGPSKVLVASRELMNGNKLRL 



GIPEVFAIYKDGKTSFKLIGASILQAIFIFLWSLLLIIPGIIKAIAYSQQFFLLKDHPEYTVLEAITESKKRMKGLKWKY 
60 110 120 130 140 150 160 170 

981 1011 1041 1071 1101 1131 1161 1191 

FLLDLSFIGWQFLTIFSFGLVYIYLLPYQTTARLIFYRNITKNS*E*FLAIFVI*VLKRTYCLFDTDFRPKYPHSVDVQV 
11= III: I :| = :|: -Ml I II = 
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FLMHLSFIGWGILCMFTLGIGLLWLIPYAGTTTAAFYEELIVPQEDIDDDQQIEG 
190 200 210 220 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
5 vaccines or diagnostics. 

Example 1141 

A DNA sequence (GBSxl217) was identified in S.agalactiae <SEQ ID 3541> which encodes the amino 
acid sequence <SEQ ID 3542>. This protein is predicted to be tRNA-guanine transglycosylase (tgt). 
Analysis of this protein sequence reveals the following: 

10 Possible site: 54 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3706 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9577> which encodes amino acid sequence <SEQ ID 9578> 
was also identified. 

20 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14731 GB:Z99118 tRNA-guanine transglycosylase [Bacillus subtilis] 
Identities = 269/377 (71%) , Positives = 320/377 (84%) 

Query: 12 MTDHPIKYRLIKQEKHTGARLGEIITPHGTFPTPMFMPVGTQATVKTQSPEELKEMGSGI 71 
25 M + PI+Y IK+ K TGARLG++ TPHG+F TP+FMPVGT ATVKT SPEELK M +GI 

Sbjct: 1 MAEQPIRYEFIKECKQTGARIX3CTHTPHGSFETP^ 60 

Query: 72 ILSNTYHLWLRPGDELIAKAGGLHKFMNWDQAILTDSGGFQVYSLADSRNITEEGVTFKN 131 
ILSNTYHLWLRPG +++ +AGGLHKFMNWD+AILTDSGGFQV+SL+ RNI EEGV F+N 
30 Sbjct: 61 ILSNTYHLWLRPGQDIVKEAGGLHKFMNWDRAILTDSGGFQVFSLSKFRNIEEEGVHFRN 120 

Query: 132 HLNGAKMFLSPEKAISIQNNLGSDIMMSFDECPQFYQPYDYVKKSIERTSRWAERGLNAH 191 

HLNG K+FLSPEKA+ IQN LGSDIMM+FDECP + YDY+K+S+ERTSRWAER LNAH 
Sbjct: 121 HLNGDKLFLS PEKAME I QNALGSD IMMAFDECPPYPAE YDYMKRSVERTSRWAERCLNAH 180 

35 

Query: 192 RRPHDQGLFGIVQGAGFEDLRRQSARDLVSMDFPGYSIGGLAVGETHDEMNAVLDFTVPM 251 

R +QGLFGIVQG +EDLR QSA+DL+S+DFPGY+IGGL+VGE D MN VL+FT P+ 
Sbjct: 181 NRQDEQGLFGIVQGGEYEDLRTQSAKDLISLDFPGYAIGGLSVGEPKDVMNRVLEFTTPL 240 

40 Query: 252 LPNDKPRYLMGVGAPDSLIDAVIRGVDMFDCVLPTRIARNGTCMTSQGRLVVKNAKFAED 311 

LP DKPRYLMGVG+PD+LID IRGVDMFDCVLPTRIARNGT T++GRL +KNAKF D 
Sbjct: 241 LPKDKPRYLMGVGSPDALIDGAIRGVDMFDCVLPTRIARNGTVFTAEGRLNMKNAKFERD 300 

Query: 312 FTPLDPNCDCYTCKNYTRAYIRHLLKADETFGIRLTSYHNLYFLVNLMKDVRQAIMDDNL 371 
45 F P+D CDCYTCKNYTRAYIRHL++ +ETFG+RLT+YHNL+FL++LM+ VRQAI +D L 

Sbjct: 301 FRPIDEECDCYTCKNYTRAYIRHLIRCNETFGLPJjTTYHNLHFLLHLMEQVRQAIREDRL 360 

Query: 372 LEFRQDFMERYGYGMNN 388 
+FR++F ERYGY N 
50 Sbjct: 361 GDFREEFFERYGYNKPN 377 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3543> which encodes the amino acid 
sequence <SEQ ID 3544>. Analysis of this protein sequence reveals the following: 

Possible site: 43 
55 >>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2590 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 351/380 (92%) , Positives = 368/380 (95%) 



Query: 


12 


MTDHPIKYRLIKQEKHTGARLGEIITPHGTFPTPMFMPVGTQATVKTQSPEELKEMGSGI 


71 






MTD+PIKYRLIK EKHTG7ARLGEIITPHGTFPTPMFMPVGTQATVKTQSPEELK +GSGI 




Sb j ct : 


1 


MTDYP I KYRLI KAEKHTGARLGEI ITPHGTFPTPMFMPVGTQATVKTQS PEELKAI GSGI 


60 


Query: 


72 


ILSNTYHLWLRPGDELIAKAGGLHKFMNWDQAILTDSGGFQVYSLADSRNITEEGVTFKN 


131 






I LSNTYHLWLRPGDEL I A+ +GGLHKFMNWDQ ILTDSGGFQVYSLADSRNITEEGVTFKN 




Sb j ct : 


61 


ILSNTYHLWLRPGDELIARSGGLHKFMNWDQPILTDSGGFQVYSLADSRNITEEGVTFKN 


120 


Query: 


132 


HLNGAKMFLSPEKAISIQNNLGSDIMMSFDECPQFYQPYDYVKKSIERTSRWAERGLNAH 


191 






HLNG+KMFLSPEKAISIQNNLGSDIMMSFDECPQFYQPYDYVKKSIERTSRWAERGL AH 




Sb j ct : 


121 


HLNGSKMFLS PEKAI S I QNNLGSD IMMSFDECPQFYQPYDYVKKS IERTSRWAERGLKAH 


180 


Query: 


192 


RRPHDQGLFGIVQGAGFEDLRRQSARDLVSMDFPGYSIGGLAVGETHDEMNAVLDFTVPM 


251 






RRPHDQGLFGIVQGAGFEDLRRQSA DLV+MDFPGYS IGGLAVGE+H+EMNAVLDFT P+ 




Sb j ct : 


181 


RRPHDQGLFGIVQGAGFEDLRRQSAADLVAMDFPGYSIGGIAVGESHEEMNAVLDFTTPL 


240 


Query: 


252 


LPNDKPRYLMGVGAPDSLIDAVIRGVDMFDCVLPTRIARNGTCMTSQGRLWKNAKFAED 


311 






LP + KPRYLMGVGAPDSL I D VIRGVDMFDCVLPTRIARNGTCMTS+GRLV+KNAKFAED 




Sbjct: 


241 


LPENKPRYLMGVGAPDSLIDGVIRGVDMFDCVLPTRIARNGTCMTSEGRLVIKNAKFAED 


300 


Query: 


312 


FTPLDPNCDCYTCKNYTRAYIRHLLKADETFGIRLTSYHNLYFLVNLMKDVRQAIMDDNL 


371 






FTPLD +CDCYTC+NY+RAYIRHLLKADETFGIRLTSYHNLYFLVNLMK VRQAIMDDNL 




Sb j ct : 


301 


FTPLDHDCDCYTCQNYSRAYIRHLLKADETFGIRLTSYHNLYFLVNLMKKVRQAIMDDNL 


360 


Query: 


372 


LEFRQDFMERYGYGMNNRNF 391 








LEFRQDF+ERYGY +NRNF 




Sb j ct : 


361 


LEFRQDFLERYGYNKSNRNF 380 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1142 

A DNA sequence (GBSxl218) was identified in S.agalactiae <SEQ ID 3545> which encodes the amino 
acid sequence <SEQ ID 3546>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 24 7 9 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9303> which encodes amino acid sequence <SEQ ID 9304> 
was also identified. A further related GBS nucleic acid sequence <SEQ ID 10795> which encodes amino 
acid sequence <SEQ ID 10796> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB16256 GB:Z99164 hypothetical protein [Schizosaccharomyces 
pombe] 

Identities = 42/91 (46%) , Positives = 62/91 (67%) , Gaps = 3/91 (3%) 

Query: 6 FGIGLDSSSRCYHYHTKLDIVALKCaVCQKYYACYKCHDALEEHCFAA-TKSDETFP-VL 63 

+G +D+ +RC+HYH+K D+VAL+C C+K+YAC++CHD L H F K+ P V+ 
Sbjct: 13 YGKLVDNETRCFHYHSKADWALRCGQCEKFYACFQCHDELNTHPFLPWRKAKFHIPCVI 72 
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Query: 64 CGSCRQMLTLKEYK- TGFCPYCRMLENPNCQ 93 

CG+C+ LT++EY+ T C YC FNP C+ 
Sbjct: 73 CGACKNSLTVEEYRSTVHCKYCNHPFNPKCK 103 



10 



15 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3547> which encodes the amino acid 
sequence <SEQ ID 3548>. Analysis of this protein sequence reveals the following: 



Possible site: 36 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2 76 9 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities » 55/93 (59%) , Positives = 62/93 (66%) 

Query: 2 MQEYFGIGLDSSSRCYHYHTKLDIVALKCAVCQKYYACYKCHDALEEHCFAATKSDETFP 61 
20 M + FGI LD RC HYHT LDIV LKCA CQ YYACY CHD L +H F T ET P 

Sbjct: 1 MTDCFGIDLDQEYRCLHYHTPLDIVGLKCASCQTYYACYHCHDQLTDHAFVPTGHQETSP 60 

Query: 62 VLCGSCRQMLTLKEYKTGFCPYCRMLFNPNCQR 94 
V+CG CR++L+ EY G CPYC+ FNP C R 
25 Sbjct: 61 VICGHCRKLLSRAEYGCGCCPYCQSPFNPACHR 93 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1143 

30 A DNA sequence (GBSxl219) was identified in S.agalactiae <SEQ ID 3549> which encodes the amino 
acid sequence <SEQ ID 3550>. This protein is predicted to be transport protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 19 

>>> Seems to have no N-terminal signal sequence 
35 INTEGRAL Likelihood = -9.45 Transmembrane 300 - 316 ( 292 - 321) 

INTEGRAL Likelihood = -1.17 Transmembrane 265 - 281 ( 265 - 281) 

Final Results 

bacterial membrane Certainty=0. 4779 (Affirmative) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 101 13> which encodes amino acid sequence <SEQ ID 
101 14> was also identified. 

45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF12002 GB:AE002075 transport protein, putative [Deinococcus radiodurans] 
Identities = 108/295 (36%) , Positives = 174/295 (58%) , Gaps = 4/295 (1%) 

Query: 31 GAWINLVNPSQEESEQVADQFGIDIDDLRAPLDvEETSRISVEDDYTLVIVDVPTYEERN 90 
50 G WI+ P+ EE +V+ + G+++D L+ PLD +E SR ED L+I+ + 

Sbjct: 21 GCWIDAAAPTTEELARVSRETGLELDYLKYPLDPDERSRFEREDGQLLIIMQTSYRLAED 80 

Query: 91 NKSYYMTIPMGIIVTDNAVITTC-LEHLTLFDHFYRRRVKNFYTFMKTRFVFQLLYRNAE 149 
+ Y T+P+GI+ TD+ ++T C LE + V+ T K R QL RNA+ 

55 Sbjct: 81 SDIPYDTVPLGILHTDHCLVWCSLEENPWKDWSGLVRRVSTVKKNRLTLQLFLRNAQ 140 
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Query: 150 LYLQALRTIDRQSDKIKAQLESATRNEQLIDMMELEKSIVYLKASLKFNERIVKKLTSST 209 

+L +R D IE ++E+ATRN +L+D+++LEKS+VY LK NE +++++ 

Sbjct: 141 RFLIDWQINKRVDAIEDKMENATRNRELI^LLKLEKSLVYFITGIjKANEAMMERVKRDR 200 

5 Query: 210 SSLKKYIEDEDLLEDTLIETQQAIEMANIYENVIiNAMTETTASIIG 269 

+ Y ED +LL+D LIE QAIEMA+I N+L +M AS+I NN N ++K L + T 
Sbjct: 201 I-FEMTEEDSELLDDVLIENLQAIE^SIASNILTSMAGAFASVINMiJVNQVVKVLTVTT 259 

Query: 270 MTLDIPWIFSAYG^FQNl^PLNGIAHGFIYVVLLAFLMSSFVVFYFIRKKWF 324 
10 + + IPT++ +GMN + +P + +GF V+ +A ++S + F F R K F 

Sbjct: 260 ILVAIPTLVSGFFGMNVEG- -LPFSDSPYGFWLVMTVAMGIASLLAFLFYRWKVF 312 

A related DNA sequence was identified in S.pyogenes <SEQ ID 715> which encodes the amino acid 
sequence <SEQ ID 716>. Analysis of this protein sequence reveals the following: 

15 Possible site: 61 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -8.81 Transmembrane 293 - 309 ( 288 - 311) 
INTEGRAL Likelihood = -1.28 Transmembrane 255 - 271 ( 255 - 271) 

20 Final Results 

bacterial membrane Certainty=0 .4524 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

25 An alignment of the GAS and GBS proteins is shown below. 

Identities = 272/314 (86%), Positives = 296/314 (93%) 

Query: 11 MKQMFLSTAIEFKEIETFEPGAWINLVNPSQEESEQVADQFGIDIDDLRAPLDVEETSRI 70 
MKQMFLS+AIEFKEIETFEPGAWI LVNPSQEES ++ADQF IDI DLRAPLDVEETSRI 
30 Sbjct: 1 MKQMFLSSAIEFKEIETFEPGAWIKLVNPSQEESMKIADQFNIDISDLRAPLDVEETSRI 60 

Query: 71 SVEDDYTLVIVDVPTYEERNNKSYYMTIPMGIIVTDNAVITTCLEHLTLFDHFYRRRVKN 130 

+ VEDDYTL+ 1 VDVP YEERNNKSYY+T+P+GI IVT+NA VI TTCL +TLFDHF+ RRVKN 
Sbjct: 61 AVEDDYTLIIVDVPIYEERNNKSYYITMPIiGIIVTENAVITTCLHDMTLFDHFHNRRVKN 120 

35 

Query: 131 FYTFMKTRFVFQLLYRNAELYLQALRTIDRQSDKIEAQLESATRNEQLIDMMELEKSIVY 190 

FYTFMKTRFVFQ+LYRNAEL+L ALRTIDRQS+++EAQLE+ATRNE+LIDMMELEKSIVY 
Sbjct: 121 FYTFMKTRFVFQILYRNAELFLTALRTIDRQSERLEAQLEAATRNEELIDMMELEKSIVY 180 

40 Query: 191 LKASLKFNERIVKKLTSSTSSLKICYIEDEDLLEDTLIETQQAIEMANIYENVLNAMTETT 250 

LKASLKFNERIVKKL+SSTSSLKKYIEDEDLLEDTLIETQQAIEMA IYENVLNAMTETT 
Sbjct: 181 LKASLKFNERIVKKLSSSTSSLKKYIEDEDLLEDTLIETQQAIEMAGIYENVLNAMTETT 240 

Query: 251 ASIIGNNQNTIMKTLALOTMTLDIPTVIFSAYGI^FQNNWMPIjNGLAHGFIYVVLIAFLM 310 

45 ASH NNQNTIMKTLAL+TM ldiptvifsaygmnfqnnw+plngl h f y+ l+a l+ 

Sbjct: 241 ASI INNNQNTIMKTLALMTMALDI PTVI FSAYGMNFQNNWLPLNGLEHAFWYITLIAMLL 300 

Query: 311 SSFWFYFIRKKWF 324 
SSFW YFIRKKWF 
50 Sbjct: 301 SSFWI YFIRKKWF 314 

SEQ ID 3550 (GBS257) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 44 (lane 3; MW 35kDa), in Figure 169 (lane 9 & 10; MW 50kDa) and in Figure 
239 (lane 2; MW 50kDa). It was also expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of 
55 total cell extract is shown in Figure 48 (lane 6; MW 60kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1144 

A DNA sequence (GBSxl220) was identified in S.agalactiae <SEQ ID 355 1> which encodes the amino 
acid sequence <SEQ ID 3552>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-12.26 Transmembrane 158 - 174 ( 151 - 182) 

INTEGRAL Likelihood = -6.37 Transmembrane 93 - 109 ( 91 - 111) 

INTEGRAL Likelihood = -5.68 Transmembrane 188 - 204 ( 184 - 205) 

INTEGRAL Likelihood = -0.85 Transmembrane 118 - 134 ( 118 - 134) 



Final Results 

bacterial membrane Certainty=0 . 5904 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3553> which encodes the amino acid 
sequence <SEQ ID 3554>. Analysis of this protein sequence reveals the following: 

Possible site: 52 
20 >» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -6.95 Transmembrane 92 - 108 ( 88 - 110) 
INTEGRAL Likelihood = -6.69 Transmembrane 153 - 169 ( 151 - 177) 
INTEGRAL Likelihood = -2.34 Transmembrane 183 - 199 ( 183 - 200) 

25 Final Results 

bacterial membrane Certainty=0. 3781 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

30 The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below. 

Identities = 135/217 (62%) , Positives = 167/217 (76%) , Gaps = 1/217 (0%) 

Query: 1 MTLQDLTKKNQEFVHIATNQLLADGKSDAEIKAILEEHLPEIIDNQKKGITARSLLGAPT 60 
35 M LQ+LTKKNQEF+H ATN+L+ DGKSD +IK I LEE +P I++NQKKG+TAR+LLG PT 

Sbjct: 1 MELQELTKKNQEFIHTATNKLIQDGKSDEDIKLILEEAIPAILENQKKGVTARNLLGTPT 60 

Query: 61 TWAASFTERPEDKARVSVQKNTNPWLNTOLDTSLLFI^LOTALNGLMLLFGQSNVNTGLIS 120 
WAASF++ P KA KNTNPWLMWLDTSLLF+G+V LNG+M F + TGLIS 

40 Sbjct: 61 AWAASFSQDPSQKA-AETDKNTNPWLWLDTSLLFIGIVALLNGIMTFFNTNATVTGLIS 119 

Query: 121 ILTLGFGGGAAMYVTYYYIYRHMGKPKSERPGWLKSFAVLALVMLVWFALFAWPLLPAT 180 

+L LGFGGGA+MY TYY+ 1 YRH+GK KS RP W K A L+L ML+W AL++ LP + 
Sbjct: 120 LLALGFGGGASMYATYYFIYRHLGKDKSLRPSWFKIIAALSLAMLIWIALYSATAFLPTS 179 

45 

Query: 181 INPKLPEWLFIIALASFGLRFYLQRKYNIQSSMAPV 217 

+NP+LP + L II S LR+YLQRKYNIQ++M+PV 
Sbjct: 180 LNPQLPPLALLI IGGVSLALRYYLQRKYNIQNTMSPV 216 

50 A related GBS gene <SEQ ID 10787> and protein <SEQ ID 10788> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: -9.94 
GvH: Signal Score (-7.5): -3.66 
55 Possible site: 29 

>>> Seems to have no N-terminal signal sequence 

ALOM program count: 4 value: -12.26 threshold: 0.0 

INTEGRAL Likelihood =-12.26 Transmembrane 158 - 174 ( 151 - 182) 
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INTEGRAL Likelihood = -6.37 Transmembrane 93 - 109 ( 91 - 111) 

INTEGRAL Likelihood = -5.68 Transmembrane 188 - 204 ( 184 - 205) 

INTEGRAL Likelihood = -0.85 Transmembrane 118 - 134 ( 118 - 134) 
PERIPHERAL Likelihood =8.43 50 
5 modified ALOM score: 2.95 

*** Reasoning Step: 3 

Final Results 

10 bacterial membrane Certainty=0. 5904 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
15 vaccines or diagnostics. 

Example 1145 

A DNA sequence (GBSxl221) was identified in S.agalactiae <SEQ ID 3555> which encodes the amino 

acid sequence <SEQ ID 3556>. Analysis of this protein sequence reveals the following: 

Possible site: 28 
20 >» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1348 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
30 vaccines or diagnostics. 

Example 1146 

A DNA sequence (GBSxl222) was identified in S.agalactiae <SEQ ID 3557> which encodes the amino 
acid sequence <SEQ ID 3558>. This protein is predicted to be excinuclease ABC (uvrA). Analysis of this 
protein sequence reveals the following: 

35 Possible site: 18 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1738 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 101 11> which encodes amino acid sequence <SEQ ID 
101 12> was also identified. 

45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC67271 GB:AF017113 excinuclease ABC subunit A [Bacillus subtilis] 
Identities = 642/940 (68%) , Positives = 785/940 (83%) , Gaps = 3/940 (0%) 

Query: 9 DKLMIRGARAHNLKNISVDIPRDKLVVVTGLSGSGKSSLAFDTiyAEGQRRYvESLSAYA 68 
50 D++ ++GARAHNLKNI V I PRD+L WVTGLSGSGKBSLAFDTIYAEGQRRYVESLSAYA 

Sbjct: 4 DRIETOGARAHNLKNIDOTIPRDQLVVVTGIiSGSGKSSLAFDTIYAEGQRRYVESLSAYA 63 
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Query: 69 RQFLGNMEKPDVDS IDGLS PAI S IDQKTTSKNPRSTVGTVTE INDYLRLLYARVGTPYCI 128 

RQFLG M+KPDVD+ 1 +GLSPAI S IDQKTTS+NPRSTVGTVTE I DYLRLLYARVG P+C 
Sbjct: 64 RQFLGQMDKPDVDAIEGLSPAISIDQKTTSRNPRSTVGTVTEIYDYLRLLYARVGKPHCP 123 

Query: 129 NGHGAITASSVEQIVDKVLALPERTKMQILAPIIRRKKGQHKSTFEKIQKDGYVRVRIDG 188 

IT+ ++EQ+VD++L PERTK+Q+LAPI+ +KG H E+I+K GYVRVRIDG 
Sbjct: 124 EHGIEITSQTIEQMVDRILEYPERTKLQVLAPIVSGRKGAHVKVLEQIRKQGYVRVRIDG 183 

Query: 189 DIHDVTEVPELSKSKMHNIDIWDRLINKEGIRSRLFDSVEAALRLSDGYWIDTMDGNE 248 

++ ++++ EL K+K H+I++V4DR++ KEG+ +RL DS+E ALRL +G V+ID + E 
Sbjct: 184 EMAELSDDIELEKNKKHSIEWIDRIWKEGVftARLSDSLETALRLGEGRVMIDVIGEEE 243 

Query: 249 LLFSEHYSCPECGFTVPELEPRLFSFNAPFGSCPTCDGLGIKLEVDIDLVIPDRSKTLRE 308 

L+FSEH++CP CGF++ ELEPRLFSFN+PFG+CPTCDGLG+KLEVD DLVIP++ +L+E 
Sbjct: 244 LMFSEHHACPHCGFSIGELEPRLFSFNSPFGACPTCDGLGMKLEVDADLVIPNQDLSLKE 303 

Query: 309 GALVPWNPISSNYYPTMLEQAMTQFGVDMDTPFEKLSKAEQDLALYGSGEREFHFHYIND 368 

A+ PW PISS YYP +LE T +G+DMD P + L K + D LYGSG+ +F Y ND 
Sbjct: 304 NAVAPVWPISSQYYPQLLEAVCTHYGIDMDVPVKDLPKHQLDKVLYGSGDDLIYFRYEND 363 

Query: 369 FGGERNIDLPFEGVVNNINRRYHETNSDYTRNVMREYMIffiLKClirCHGYRLNDQALCVRV 428 

FG R ++ FEGV+ NI RRY ET SD+ R M +YM++ C TC GYRL +AL V + 
Sbjct: 364 FGQIREGEIQFEGVLRNIERRYKETGSDFIREQMEQYMSQKSCPTCKGYRLKKEALAVLI 423 

Query: 429 GGEEGLNIGQVSDLSIADHLELLETLRLSSNEQLIARPIIKEIHDRLSFIjNNVGLNYLNL 488 

+G +IG++++LS+AD L + L LS + IA I++EI +RLSFL+ VGL+YL L 
Sbjct: 424 ---DGRHIGKITELSVADALAFFKDLTLSEKDMQIANLILREIVERLSFLDKVGLDYLTL 480 

Query: 489 SRSAGTLSGGESQRIRLATQIGSNLSGVLYVLDEPSIGLHQRDNDRLIDSLKKMRDLGNT 548 

SR+AGTLSGGE+QRIRLATQIGS LSGVLY+LDEPS IGLHQRDNDRLI +LK MRDLGNT 
Sbjct: 481 SRAAGTLSGGEAQRIRLATQIGSRLSGVLYILDEPSIGLHQRDNDRLISALKNMRDLGNT 540 

Query: 549 LI VVEHDEDTMMAADWL IDVGPGAGAFGGEIVASGTPKQVAKNTKS I TGQYLSGKKVI PV 608 

LIWEHDEDTMMAAD+LID+GPGAG ' GG+++++GTP++V ++ S+TG YLSGKK IP+ 
Sbjct: 541 LIWEHDEDTMMAADYLIDIGPGAGIHGGQVISAGTPEEVMEDPNSLTGSYLSGKKFIPL 600 

Query: 609 PSERRVGNGRFLEIKGAAENNLQNLDVKFPLGKFIAVTGVSGSGKSTLINSILKKAVAQK 668 

P ERR +GR++EIKGA+ENNL+ ++ KFPLG F AVTGVSGSGKSTL+N IL KA+AQK 
Sbjct: 601 PPERRKPDGRYIEIKGASENNLKKVNAKFPLGTFTAVTGVSGSGKSTLVNEILHKALAQK 660 

Query: 669 LNRNSDKPGKYVSLEGIEYVDRLIDIDQSPIGRTPRSNPATYTGVFDDIRDLFAQTNEAK 728 

L++ KPG + ++G++++D++IDIDQ+PIGRTPRSNPATYTGVFDDIRD+FAQTNEAK 
Sbjct: 661 LHKAKAKPGSHKEIKGLDHLDKVIDIDQAPIGRTPRSNPATYTGVFDDIRDVFAQTNEAK 720 

Query: 729 IRGYKKGRFSFNVKGGRCESCSGDGIIKIEMHFLPDVYVPCEVCHGTRYNSETLEVHYKE 788 

+RGYKKGRFSFNVKGGRCE+C GDGI I KIEMHFLPDVYVPCEVCHG RYN ETLEV YK 
Sbjct: 721 TOGYKKGRFSF1SWKGGRCEACRGDGIIKIEMHFLPDVYVPCEVCHGKRYNRETLEVTYKG 780 

Query: 789 KNIAQILDMTVTOAWFFAAIPKIARKLQTIKDVGLGYVTLGQPATTLSGGEAQRMKLAS 848 

K+I+ +LDMTV DA++FF IPKI RKLQT+ DVGLGY+TLGQPATTLSGGEAQR+KLAS 
Sbjct: 781 KS I SDVLDMTVEDALS FFENI PKI KRKLQTLYDVGLGYITLGQPATTLSGGEAQRVKLAS 840 

Query: 849 ELHKRSTGKSLYILDEPTTGLHADDIARLLKVLDRFVDDGNTVLVIEHNLDVIKTADHII 908 

ELHKRSTG++LYILDEPTTGLH DDIARLL VL R VD+G+TVLVIEHNLD+IKTAD+I+ 
Sbjct: 841 ELHKRSTGRTLYILDEPTTGLHVDDIARLLVVLQRLVDNGDTVLVIEHNLDIIKTADYIV 900 

Query: 909 DLGPEGGIGGGQIVAIGTPEEVAENPKSYTGYYLKEKLAR 948 

DLGPEGG GGG IVA GTPEE+ E +SYTG YLK + R 
Sbjct: 901 DLGPEGGAGGGTIVASGTPEEITEVEESYTGRYLKPVIER 940 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3559> which encodes the amino acid 
sequence <SEQ ID 3560>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

»> Seems to have no N-terminal signal sequence 



Final Results 
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bacterial cytoplasm Certainty=0. 1138 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 An alignment of the GAS and GBS proteins is shown below. 

Identities = 835/940 (88%) , Positives = 896/940 (94%) 

Query: 7 MQDKLMIRGARAHNLKNISVDIPRDKLWVTGLSGSGKSSLAFDTIYAEGQRRYVESLSA 66 
MQ+K++I GARAHNLKNI V+IPRDKLVWTGLSGSGKSSLAFDTIYAEGQRRYVESLSA 
10 Sbjct: 11 MQNKIIIHGARAHNLKNIDVEIPRDKLVWTGLSGSGKSSLAFDTIYAEGQRRYVESLSA 70 

Query: 67 YARQFLGNMEKPDVDSIDGLSPAISIDQKTTSKNPRSTVGTVTEINDYLRLLYARVGTPY 126 

YARQFLGNMEKPDVDSIDGLSPAISIDQKTTSKNPRSTVGTVTEINDYLRLLYARVGTPY 
Sbjct: 71 YARQFLGNMEKPDVDSIDGLSPAISIDQKTTSKNPRSTVGTVTEINDYLRLLYARVGTPY 130 

15 

Query: 127 CINGHGAITASSVEQIVDKVLALPERTKMQIIAPIIRRKKGQHKSTFEKIQKDGYVRVRI 186 

CINGHGAITASS EQIV++VLALPERT+MQILAP++RRKKGQHK+ FEKIQKDGYVRVR+ 
Sbjct: 131 CINGHGAITASSAEQIVEQVLALPERTRMQILAPWRRKKGQHKTVFEKIQKDGYVRVRV 190 

20 Query: 187 DGDIHDVTEVPELSKSKMHNIDIWDRLINKEGIRSRLFDSVEAALRLSDGYWIDTMDG 246 

DGDI DVTEVPELSKSKMHNI++V+DRL+NK+GIRSRLFDSVEAALRL DGY++ IDTMDG 
Sbjct: 191 DGDIFDVTEVPELSKSKMHNIEWIDRLVNKDGIRSRLFDSVEAALRLGDGYLMIDTMDG 250 

Query: 247 NELLFSEHYSCPECGFTVPELEPRLFSFNAPFGSCPTCDGLGIKLEVDIDLVIPDRSKTL 306 
25 NELLFSEHYSCP CGFTVPELEPRLFSFNAPFGSCPTCDGLGIKLEVD+DLV+PD SK+L 

Sbjct: 251 NELLFSEHYSCPVCGFTVPELEPRLFSFNAPFGSCPTCDGLGIKLEVDLDLWPDPSKSL 310 

Query: 307 REGALVPWNPISSNYYPTMLEQAMTQFGVDMDTPFEKLSKAEQDIALYGSGEREFHFHYI 366 
REGAL PWNPISSNYYPTMLEQAM FGVDMDTPFE L++ E+DL LYGSG+REFHFHY+ 
30 Sbjct: 311 REGALAPWNP I S SNYYPTMLEQAMASPGVDMDTPFEALTEEERDLVLYGSGDREFHFHYV 370 

Query: 367 NDFGGERNIDLPFEGVVNNINRRYHETNSDYTRNVMREYMNELK^ 426 

NDFGGERNID+PFEGW N+NRRYHETNSDYTRNVMR YMNEL C TCHGYRLNDQALCV 
Sbjct: 371 NDFGGERNI D I PFEGVVTNVNRRYHETNSDYTRNVMRGYMNELTI^TCHGYRIjNDQALCV 430 

35 

Query: 427 RVGGEEGLNIGQVSDLSIADHLELLETLRLSSNEQLIARPIIKEIHDRLSFLNNVGLNYL 486 

VGGEEG + IGQ+S+LS IADHL+LLE L L+ NE IA+PI+KEIHDRL+FLNNVGLNYL 
Sbjct: 431 HVGGEEGTHIGQISELSIADHLQLLEELELTENESTIAKPIVKEIHDRLTFLNNVGLNYL 490 

40 Query: 487 NLSRSAGTLSGGESQRIRLATQIGSNLSGVLYVLDEPSIGLHQRDNDRLIDSLKKMRDLG 546 

LSR+AGTLSGGESQRIRLATQIGSNLSGVLY+LDEPSIGLHQRDNDRLI+SLKKMRDLG 
Sbjct: 491 TLSRARGTLSGGESQRIRLATQIGSNLSGVLYILDEPSIGLHQRDNDRLIESLICKMRDLG 550 

Query: 547 NTLIWEHDEDTMMAADWLIDVGPGAGAFGGEIVASGTPKQVAKNTKSITGQYLSGKKVI 606 
45 NTLIWEHDEDTMM ADWLIDVGPGAG FGGEI ASGTPKQVAKN KSITGQYLSGKK I 

Sbjct: 551 NTLIWEHDEDTMMQADWLIDVGPGAGEFGGEITASGTPKQVAKNKKSITGQYLSGKKFI 610 

Query: 607 PVPSERRVGNGRFLEIKGAAENNLQNLDVKFPLGKFIAVTGVSGSGKSTLINSILKKAVA 666 
PVP ERR GNGRF+E I KGAA+NNLQ+LDV+ FPLGKFIAVTGVSGSGKSTL+NS I LKKAVA 
50 Sbjct: 611 PVPLERRSGNGRFIEIKGAAQNNLQSLDVRFPLGKFIAVTGVSGSGKSTLVNSILKKAVA 670 

Query: 667 QKLNRNSDKPGKYVSLEGIEYVDRLIDIDQSPIGRTPRSNPATYTGVFDDIRDLFAQTNE 726 

QKLNRN+DKPGKY S+ GIE+++RLIDIDQSPIGRTPRSNPATYTGVFDDIRDLFAQTNE 
Sbjct: 671 QKLNRNADKPGKYHSISGIEHIERLIDIDQSPIGRTPRSNPATYTGVFDDIRDLFAQTNE 730 

55 

Query: 727 AKIRGYKKGRFSFNVKGGRCESCSGDGIIKIEMHFLPDVYVPCEVCHGTRYNSETLEVHY 786 

AKIRGYKKGRFSFNVKGGRCE+CSGDGIIKIEMHFLPDVYVPCEVCHG RYNSETLEVHY 
Sbjct: 731 AKIRGYKKGRFSFNVKGGRCEACSGDGIIKIEMHFLPDVYVPCEVCHGRRYNSETLEVHY 790 

60 Query: 787 KEKNIAQILDMTVNDAVTFFAAIPKIARKLQTIKDVGLGYVTLGQPATTLSGGEAQRMKL 846 

K KNIA++LDMTV+DA+ FF+AIPKIARK+QTIKDVGLGYVTLGQPATTLSGGEAQRMKL 
Sbjct: 791 KGKNIAEVLDMTVDDALVFFSAIPKIARKIQTIKDVGLGYVTLGQPATTLSGGEAQRMKL 850 

Query: 847 ASELHKRSTGKSLYILDEPTTGLHADDIARLLKVLDRFVDDGNTVLVIEHNLDVIKTADH 906 
65 ASELHKRSTGKSLYILDEPTTGLH DDIARLLKVL+RFVDDGNTVLVIEHNLDVIK+ADH 

Sbjct: 851 ASELHKRSTGKSLYILDEPTTGLHTDDIARLLKVLERFVDDGNTVLVIEHNLDVIKSADH 910 
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Query: 907 IIDLGPEGGIGGGQIVAIGTPEEVAENPKSYTGYYLKEKL 946 

IIDLGPEGG GGGQIVA GTPEEVA+ +SYTG+YLK KL 
Sbjct: 911 I1DLGPEGGDGGGQIVATGTPEEVAQVKESYTGHYLKVKL 950 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1147 

A DNA sequence (GBSxl223) was identified in S.agalactiae <SEQ ID 3561> which encodes the amino 
acid sequence <SEQ ID 3562>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>>> Seems to have an uncleavable N-term signal seq 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12192 GB:Z99106 similar to multidrug resistance protein [Bacillus subtilis] 
Identities = 198/481 (41%) , Positives = 300/481 (62%) , Gaps = 24/481 (4%) 



Query: 


9 


IHGKPYNRTAMITLLLIATFAGVLNQTSLGTAIPTLMNSFNISLSTAQQATTWFLLANGI 


68 






I KP+NR+ ++ +LL F 4LNQT L TA+P +M FN+ + AQ TT F+L NGI 




Sb j ct : 


5 


IEQKPFNRSVIVGILLAGAFVAILNQTLLITALPHIMRDFNVDANQAQWLTTSFMLTNGI 


64 


Query: 


69 


MIPVSAYLATRFSTKWLYVTSYVVLLIGLLMTTLAPTSNWNLFLVGRIIQAISVGISMPL 


128 






+IP++A+L +F+++ L +T+ + G ++ AP N+ + L RIIQA GI MPL 




Sb j ct : 


65 


LIPITAFLIEKFTSRALLITAMSIFTAGTWGAFAP--NFPVLLTARIIQAAGAGIMMPL 


122 


Query: 


129 


MQVVMVlWFPPEQRGAflMGLNGLVVGLAPAIGPTLAGWILKQEFHFAGHDLTWRAIFLLP 


188 



Final Results 



bacterial membrane Certainty=p. 5161 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0.0000 (Not Clear) < suco 



MQ V + +FP E+RG AMG+ GLV+ APAIGPTL+GW ++ 
Sbjct: 123 MQTVFLTIFPIEKRGQAMGMVGLVISFAPAIGPTLSGWAVEA- 



+WR++F + 
FSWRSLFYII 174 



Query: 189 LLILTVTTILSPFVLKDWDNKSVKLEVPSLILSIIGFGSFLWGFTNVATYGWGDIGYVI 248 
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20 



Query: 429 LSSVAQNIITOTKPSKDLLTMNPLKYANQMI^NASLDGFHVSFAIGFVFAVLGLLVSLFLRK 489 

L SV N + + +A+L G + +F + V A++G L+S L+K 

Sbjct: 414 LVSVMSNQAAH AGTTNVKHAALHGMNAAFIVAAVIALVGFLLSFTLKK 461 

5 There is also homology to SEQ ID 46. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1148 

A DNA sequence (GBSxl224) was identified in S.agalactiae <SEQ ID 3563> which encodes the amino 
10 acid sequence <SEQ ID 3564>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -8.81 Transmembrane 8 - 24 ( 5-30) 
INTEGRAL Likelihood = -7.32 Transmembrane 36 - 52 ( 31 - 54) 

15 

Final Results 

bacterial membrane Certainty=0 .4524 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10109> which encodes amino acid sequence <SEQ ID 
101 10> was also identified. 

A related GBS gene <SEQ ID 8743> and protein <SEQ ID 8744> were also identified. Analysis of this 

protein sequence reveals the following: 

25 Lipop: Possible site: -1 Crend: 8 

McG: Discrim Score: 9.52 
GvH: Signal Score (-7.5): -3.4 

Possible site: 22 
»> Seems to have an uncleavable N-term signal seq 
30 ALOM program count: 1 value: -7.32 threshold: 0.0 

INTEGRAL Likelihood = -7.32 Transmembrane 11 - 27 ( 6 - 29) 
PERIPHERAL Likelihood =11.19 130 
modified ALOM score: 1.96 

35 *** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 3930 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 8744 (GBS29) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
45 extract is shown in Figure 7 (lane 2; MW 25.6kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 15 (lane 6; MW 51kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1149 

A DNA sequence (GBSxl225) was identified in S.agalactiae <SEQ ID 3565> which encodes the amino 
acid sequence <SEQ ID 3566>. This protein is predicted to be aminopeptidase P (pepQ). Analysis of this 
protein sequence reveals the following: 

5 Possible site: 41 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0724 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA70068 GB:Y08842 aminopeptidase P [Lactococcus lactis] 
15 Identities = 44/126 (34%) , Positives = 78/126 (60%) 

Query: 6 RLTRCQTAISQLSCDALLITNLTNIFYLTGFSGTNATVLISPKHRIFVTDSRYALIAKNT 65 

R+ + + + + D+LLIT++ NIFYLTGFSGT TV ++ K IF+TDSRY+ +A+ 
Sbjct: 2 RIEKLKVKMLTENIDSLLITDMKNIFYLTGFSGTAGTVFLTQKRNIFMTDSRYSEMARGL 61 

20 

Query: 66 VREFDI I ISRBPLAAILKI IRDDALIAIGFETDISYHMYKHMVEVFEDYRLIEAPSWEK 125 

++ F+II +R+P++ + ++ +++ +FE+Y+K++ L +V + 

Sbjct: 62 IKNFEIIETRDPISLLTELSASESVKNMAFEETVDYAFFKRLSKAATKLDLFSTSNFVLE 121 

25 Query: 126 LRMIKD 131 

LR IKD 
Sbjct: 122 LRQIKD 127 

There is also homology to SEQ ID 3568. 

30 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1150 

A DNA sequence (GBSxl226) was identified in S.agalactiae <SEQ ID 3569> which encodes the amino 
acid sequence <SEQ ID 3570>. This protein is predicted to be aminopeptidase P (pepQ-2). Analysis of this 
35 protein sequence reveals the following: 

Possible site: 44 

>>> Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0 .2508 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

, bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

45 >GP:CAA70068 GB:Y08842 aminopeptidase P [Lactococcus lactis] 

Identities = 131/205 (63%) , Positives = 163/205 (78%) , Gaps = 3/205 (1%) 

Query: 2 LDFIKPDRTTELQVANFLDFRMRELGATGPSFDFIVASGYRSAMPHGVASQKTIQSGETL 61 
L FI+P RT E++VANFLDF+MR+L A+G SF+ IVASG RS++PHGVA+ K IQ G+ + 
50 Sbjct: 149 LRFIEPGRT-EIEVANFLDFKMRDLEASGISFETIVASGKRSSLPHGVATSKMIQFGDPV 207 

Query: 62 TLDFGCYYQHYVSDMTRTIHIGHVTDQEREIYDIVLKSNQAIIGNVKSGMKRCDYDYLAR 121 

T+DFGCYY+HY SDMTRTI +G V D+ R IY+ V K+N+A+I VK+GM YD + R 
Sbjct: 208 TIDFGCYYEHYASDMTRTIFVGSVDDKMRTIYETVRKANFALIKQVKAGMTYAQYDNIPR 267 

Query: 122 QVIENSGYGNHFTHGIGHGMGLDVHEIPYFGKS--EGVIASGMVVTDEPGIYLDNKYGVR 179 



55 
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+VIE + +G + FTHGIGHG+GLDVHE I PYF +S E + SGMV+TDEPGIYL GVR 
Sbjct: 268 EVIEKADFGQYFTHGIGHGLGLDVHEIPYFNQSMTENQLRSGMVITDEPGIYLPEFGGVR 327 

Query: 180 IEDDLLITETGCEVLTSAPKELIVL 204 

IEDDLL+TE GCEVLT APKELIV+ 
Sbjct: 328 IEDDLLVTENGCEVLTKAPKEBIVI 352 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3567> which encodes the amino acid 
sequence <SEQ ID 3568>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1450 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 145/203 (71%) , Positives = 171/203 (83%) 



Query: 


2 


LDFIKPDRTTELQVANFLDFRMRELGATGPSFDFI VASGYRSAMPHGVASQKTIQSGETL 


61 






LDFIKP TTE +ANFLDFRMR+ GA+G SFD IVASGY SAMPHG AS K IQ+ E+L 




Sb j ct : 


168 


LDFIKPGTTTERDLANFLDFRMRQYGASGTSFDIIVASGYLSAMPHGRASDKVIQNKESL 


227 


Query: 


62 


TLDFGCYYQHYVSDMTRTIHIGHVTDQEREIYDIVLKSNQAIIGNVKSGMKRCDYDYLAR 


121 






T+DFGCYY HYVSDMTRTIHIG VTD+EREIY +VL +N+A+I +GM D+D + R 




Sb j ct : 


228 


TMDFGCYYNHYVSDMTRTIHIGQVTDEEREIYALVIiAANKALIAKASAGMTYSDFDGIPR 


287 


Query: 


122 


QVIENSGYGNHFTHGIGHGMGLDVHEIPYFGKSEGVIASGMWTDEPGIYLDNKYGVRIE 


181 






Q+I +GYG+ FTHGIGHG+GLD+HE P+FGKSE ++ +GMWTDEPGIYLDNKYGVRIE 




Sb j Ct : 


288 


QLITEAGYGSRFTHGIGHGIGLDIHENPFFGKSEQLLQAGIWVTDEPGIYLDNKYGVRIE 


347 


Query: 


182 


DDLLITETGCEVLTSAPKELIVL 2 04 








DDL+IT+TGC+VLT APKELIVL 




Sb j Ct : 


348 


DDLVITKTGCQVLTLAPKELIVL 370 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1151 

A DNA sequence (GBSxl227) was identified in S.agalactiae <SEQ ID 3571> which encodes the amino 
acid sequence <SEQ ID 3572>. This protein is predicted to be yfhC protein (comEB). Analysis of this 
protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1401 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05053 GB:AP001511 late competence operon required for DNA 
binding and uptake [Bacillus halodurans] 
Identities = 78/146 (53%) , Positives = 107/146 (72%) 

Query: 1 MNRLSVffiDYFMANAELISKRSTCDRAFVGAVLVKNNRIIATGYNGGVSETDNCNEVGHYM 60 

MNR+SW+ YFMA + L++ RSTC R VGA +V++ RIIA GYNG +S +C + G Y+ 
Sbjct: 1 MNRISWDQYFMAQSHLIALRSTCTRLMVGATIVRDKRIIAGGYNGSISGGPHCIDEGCYV 60 
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Query: 61 EDGHCIRTVHAE^ALIQCAKEGISTNNTEIYVTHFPCINCTKALLQAGVKKITYKftNYR 120 

+GHCIRT+HAE+NAL+QCAK G+ T EIYVTHFPC+NCTKA++Q+G+KK+ Y +Y+ 
Sbjct: 61 VEGHCIRTIHAEVNALLQCAKFGVPTEGREIYVTHFPCVNCTKA.IIQSGIKKVYYATDYK 120 

5 

Query: 121 PHPFAIELMEAKGVAYVQHDVPEVTL 146 

P+A EL GV Q ++ E+ L 
Sbjct: 121 NSPYAEELFRDAGVDVEQVELEEMIL 146 

10 A related DNA sequence was identified in S.pyogenes <SEQ ID 3573> which encodes the amino acid 
sequence <SEQ ID 3574>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>>> Seems to have no N-terminal signal sequence 

15 Final Results 

bacterial cytoplasm Certainty=0 . 3155 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 An alignment of the GAS and GBS proteins is shown below. 

Identities = 133/146 (91%), Positives = 140/146 (95%) 

Query: 2 NRLSWEDYFMAMAEL I SKRSTCDRAFVGAVLVKNNRI IATGYNGGVSETDNCNEVGHYME 61 
NRLSW+DYFMANAELISKRSTCDRAFVGAVLVK+NRIIATGYNGGVS TDNCNE GHYME 
25 Sbjct: 18 NRLSWQDYFMANAELI SKRSTCDR&FVGAVLVKDNRI IATGYNGGVSATDNCNEAGHYME 77 

Query: 62 DGHCIRTVHAEMNALIQCAKEGISTNNTEIYVTHFPCINCTKALLQAGVKKITYKANYRP 121 

DGHCIRTVHAEMNALIQCAKEG1ST+ TEIYVTHFPCINCTKALLQAG+ KITYKA+YRP 
Sbjct: 78 DGHCIRTvHAEMNALIQCAKEGISTDGTEIYVTHFPCINCTECALLQAGITKITYKAHYRP 137 

30 

Query: 122 HPFAIELMEAKGVAYVQHDVPEVTLG 147 

HPFAIELME KGVAYVQHDVP++ LG 
Sbjct: 138 HPFAIELMEKKGVAYVQHDVPQIVLG 163 

35 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1152 

A DNA sequence (GBSxl228) was identified in S.agalactiae <SEQ ID 3575> which encodes the amino 
acid sequence <SEQ ID 3576>. Analysis of this protein sequence reveals the following: 

40 Possible site: 13 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2454 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

50 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1153 

A DNA sequence (GBSxl229) was identified in S.agalactiae <SEQ ID 3577> which encodes the amino 
acid sequence <SEQ ID 3578>. Analysis of this protein sequence reveals the following: 

Possible site: 25 
5 >» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -1.65 Transmembrane 4 - 20 ( 3-21) 

Final Results 

bacterial membrane Certainty=0 . 1659 (Affirmative) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

15 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1154 

A DNA sequence (GBSxl230) was identified in S.agalactiae <SEQ ID 3579> which encodes the amino 

acid sequence <SEQ ID 3580>. Analysis of this protein sequence reveals the following: 

20 Possible site: 54 

>>> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

25 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04699 GB:AP001510 unknown conserved protein [Bacillus halodurans] 
30 Identities = 47/94 (50%) , Positives = 65/94 (69%) 

Query: 2 LLPVGSWYLIDGNQKLVI VNRGAIVEQEGQEVYFDYLGGIFPEGLNLEQVYYFNQEDID 61. 

+LP+GS+VYL +G KL+I+NRG I+E G+ FDY G +P+GL ++V+YFN E+ID 
Sbjct: 1 MLPIGSIVYLKEGTSKLMILNRGPILEANGENKMFDYSGCFYPQGLVPDKVFYFNHENID 60 

35 

Query: 62 EWFEGYHDEEEERVSRLIEKWKNTEGKNLPKGK 95 

EWFEG+ D+EE+R +L WK KGK 
Sbjct: 61 EWFEGFQDDEEQRFQKLFHDWKKENKDRYVKGK 94 

40 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1155 

A DNA sequence (GBSxl231) was identified in S.agalactiae <SEQ ID 3581> which encodes the amino 
45 acid sequence <SEQ ID 3582>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>>> Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0 .3560 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 {Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

5 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1156 

A DNA sequence (GBSxl232) was identified in S.agalactiae <SEQ ID 3583> which encodes the amino 
acid sequence <SEQ ID 3584>. This protein is predicted to be elongation factor p (efp). Analysis of this 
10 protein sequence reveals the following: 

Possible site: 29 

>» Seems to have no N- terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0 . 3067 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

20 >GP:CAB14376 GB:Z99116 elongation factor P [Bacillus subtilis] 

Identities = 89/186 (47%) , Positives = 120/186 (63%) , Gaps = 1/186 (0%) 

Query: 1 M1EASKLKAGMTFETADGKLIRVLEASHHKPGRGNTIMRMKLRDVRTGSTFDTSYRPEEK 60 
MI + + G+T + DG + RV++ H KPGKG +R KLR++RTG+ + ++R EK 
25 Sbjct: 1 MISVNDFRTGLTIDV-DGGIWRVVDFQHVTCPGKGAAFVRSKLRNLRTGAIQEKTF 59 

Query: 61 FEQAIIEWPAQYLYKMDDTAYFMNNETYDQYEIPTVNIENELLYILENSEVKIQFYGTE 120 

+A IET QYLY D FM+ +Y+Q E+ IE EL Y+LEN V I Y E 
Sbjct: 60 VAKAQIETKTMQYLYANGDQHVFMDTSSYEQLELSATQIEEELKYLLENMSVHIMMYQDE 119 

30 

Query: 121 VIGVQIPTTTOLTVAETQPSIKGATVTGSGKPATMETGLVVNVPDFIEAGQKLVINTAEG 180 

+G+++P TVEL V ET+P I KG T +G KPA ETGLWNVP F+ G LV+NT++G 
Sbjct: 120 TLGIELPNTVELKVVETEPGIKGDTASGGTKPAKTETGLVVNVPFFVNEGDTLVVNTSDG 179 

35 Query: 181 TYVSRA 186 

+YVSRA 
Sbjct: 180 SYVSRA 185 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3585> which encodes the amino acid 
40 sequence <SEQ ID 3586>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>>> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0 . 1813 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty^O . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

50 Identities = 170/186 (91%) , Positives = 180/186 (96%) , Gaps = 1/186 (0%) 

Query: 1 MIEASKLKAGMTFETADGKLIRVLEASHHKPGKGNTIMRMKLRDVRTGSTFDTSYRPEEK 60 

MIEASKLKAGMTFE A+GKLIRVLEASHHKPGKGNTIMRMKLRDVRTGSTFDT+YRP+EK 
Sbjct: 1 MIEASKLKAGMTFE -AEGKLIRVLEASHHKPGKGNTIMRMKLRDVRTGSTFDTTYRPDEK 59 



55 



Query: 61 FEQAIIETVPAQYLYKMDDTAYFMNNETYDQYEIPTVNIENELLYILENSEVKIQFYGTE 120 
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FEQAIIETVPAQYLYKMDDTAYFMN +TYDQYEIP N+E ELLYILENS+VKIQFYG+E 
Sbjct: 60 FEQAIIETOPAQYLYKMDDTAYFl^DTYDQYEIP\ffiNVEQELLYILENSDVKIQFYGSE 119 

Query: 121 VIGVQIPTTVELTVAETQPSIKGATVTGSGKPATMETGLVVNVPDFIEAGQKLVINTAEG 180 
5 VIGV + PTTVELTVAETQPS I KGATVTGSGKPAT+ETGLWNVPDFIEAGQKL+ INTAEG 

Sbjct: 120 V1GVTVPTTVELWAETQPSIKGATVTGSGKPATLETGLVVOTPDFIEAGQKLIINTAEG 179 

Query: 181 TYVSRA 186 
TYVSRA 

10 Sbjct: 180 TYVSRA 185 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1157 

15 A DNA sequence (GBSxl233) was identified in S.agalactiae <SEQ ID 3587> which encodes the amino 
acid sequence <SEQ ID 3588>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

>» Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0 . 1508 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06505 GB:AP001516 unknown conserved protein [Bacillus halodurans] 
Identities = 42/107 (39%) , Positives = 70/107 (65%) , Gaps = 4/107 (3%) 

Query: 5 NLGE I VI S PRVLEVITGIAATKVDGVHSLRNK AVTDSLSKKSLGRGVYLKNEEDDTV 61 

30 +LG + ISP V+EVI GIAA++V+GV ++R V + L K+ G+GV + + D+ + 

Sbjct: 15 DLGRVE I S PEVI EVIAGI AASEVEGVATMRGNFAAGVAE KLGYKNHGKGVKV- DLNDEGI 73 

Query: 62 AADIYVYLQYGVNVPAVSIAIQQAVKTAVYDMAEVKISSVNIHVEGI 108 
D+ V + YGV+VP V+ IQQ +K A+ M +++ S+N+H+ G+ 
35 Sbjct: 74 IVDVSVIILYGVSVPEVAKKIQQNIKQALQTMTAIELQSINVHIVGV 120 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3589> which encodes the amino acid 
sequence <SEQ ID 3590>. Analysis of this protein sequence reveals the following: 

Possible site: 41 
40 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0882 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 101/129 (78%) , Positives = 113/129 (87%) 

50 Query: 1 MTTENLGEIVISPRVLEVITGIAATKVDGVHSLRNKAVTDSLSKKSLGRGVYLKNEEDDT 60 

MTTE +GE I VI S PRVLEVITGI A T+V+GVHSL NK + DS +K SLG+GVYL+ EED + 
Sbjct: 1 MTTEYIGEIVISPRVLEVITGIATTQVEGVHSLHNKKMADSFNKASLGKGVYLQTEEDGS 60 

Query: 61 VAADIYVYLQYGVNVPAVSIAIQQAVKTAVYDMAEVKISSVNIHVEGIVPEKTPKPDLKS 120 
55 V AD I YVYLQYGV VP VS+ IQ+ VK+AVYDMAEV IS+VNIHVEGIV EKTPKPDLKS 

Sbjct: 61 VTADIYVYLQYGVKVPTVSMNIQKTVKSAVYDMAEVPISAVNIHVEGIVAEKTPKPDLKS 120 



Query: 121 LFDEDFLDD 129 
LFDEDFLDD 
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Sbjct: 121 LFDEDFLDD 129 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1158 

A DNA sequence (GBSxl234) was identified in S.agalactiae <SEQ ID 3591> which encodes the amino 
acid sequence <SEQ ID 3592>. This protein is predicted to be n utilization substance protein b homolog 
(nusB). Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.32 Transmembrane 48 - 64 ( 47 - 64) 



Final Results 

bacterial membrane Certainty=0 . 1128 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14363 GB:Z99116 similar to transcription termination 
[Bacillus subtilis] 
Identities = 51/129 (39%) , Positives = 82/129 (63%) , Gaps = 9/129 (6%) 

Query: 9 RRDLRERAFQTLFSLETGGEFIDAAHFAYGYDKTVSEDKVLEVPIFLLNLVNGVVDHKDE 68 

RR RE+A Q LF ++ ++ A + + E+K F LV+GV++H+D+ 

Sbjct: 3 RRTAREKALQALFQIDVSDIAVNEA IEHALDEERT DPFFEQLVHGVLEHQDQ 54 

Query: 69 LDTLISSHLKSGWSLERLTLVDKSLLRLGLYEIKYFDETPDRVAUSIEIIEIAKKYSDETS 128 

LD +IS HL + W L+R+ VD+++LRL YE+ Y ++ P V++NE IE+AK++ D+ + 
Sbjct: 55 LDEMISKHLVN-WKLDRIANVDRAILRLAAYEMAYAEDIPVNVSMNEAIELAKRFGDDKA 113 

Query: 129 AKFVNGLLS 137 

KFVNG+LS 
Sbjct: 114 TKFVNGVLS 122 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3593> which encodes the amino acid 
sequence <SEQ ID 3594>. Analysis of this protein sequence reveals the following: 

Possible site: 44 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.75 Transmembrane 53 - 69 ( 53 - 69) 



Final Results 

bacterial membrane Certainty=0 . 1702 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB14363 GB:Z99116 similar to transcription termination 
[Bacillus subtilis] 
Identities = 47/134 (35%) , Positives = 76/134 (56%) , Gaps = 10/134 (7%) 

Query: 15 RRDLRERAFQALFNIEMGAELLAASQFAYGYDKVTGEDAQVLELPIFLLSLVTGVNNHKE 74 

RR RE+A QALF I++ +++ + D+ + F LV GV H++ 

Sbjct: 3 RRTAREKALQALFQIDV- SDIAVNEAIEHALDEEKTDP FFEQLVHGVLEHQD 53 



Query: 75 ELDNLISTHLKKGWSLERLTLTDKTLLRLGLFEIKYFDKTPDRVALNEIIEVVKKYSDET 134 

+LD +IS HL W L+R+ D+ +LRL +E+ Y + P V++NE IE+ K++ D+ 
Sbjct: 54 QLDEMISKHLTO-WKLDRIANVDRAILRIAAYE^YAEDIPVNVSMNEAIELAKRFGDDK 112 
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Query: 135 SAKFINGLLSQYVS 148 

+ KF+NG+LS S 
Sbjct: 113 ATKFVNGVLSNIKS 126 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 104/142 (73%), Positives = 125/142 (87%), Gaps = 1/142 (0%) 

Query: 1 MTSVFKDSRRDLRERAFQTLFSLETGGEFIDAAHFAYGYDKTVSED-KVLEVPIFLLNLV 59 

MT+ F++SRRDLRERAFQ LF++E G E + A+ FAYGYDK ED +VLE+PIFLL+LV 
Sbjct: 7 MTNSFQNSRRDLRERAFQALFNIEMGAELLAASQFAYGYDKVTGEDAQVLELPIFLLSLV 66 

Query: 60 NGWDHKDELDTLI S SHLKSGWSLERLTLVDKSLLRLGLYE I KYFDETPDR VALNE I IE I 119 

GV +HK+ELD LIS+HLK GWSLERLTL DK+LLRLGL+EIKYFD+TPDRVALNEIIE+ 
Sbjct: 67 TGVlSnSIHKEELDNLISTHLKKGWSLERLTLTDKTIiLRLGLFEIKYFDKTPDRVALNEIIEV 126 

Query: 120 AKKYSDETSAKFVNGLLSQFIT 141 

KKYSDETSAKF+NGLLSQ+++ 
Sbjct: 127 VKKYSDETSAKFINGLLSQYVS 148 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1159 

A DNA sequence (GBSxl235) was identified in S.agalactiae <SEQ ID 3595> which encodes the amino 
acid sequence <SEQ ID 3596>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -2.81 Transmembrane 239 - 255 ( 239 - 255) 

Final Results 

bacterial membrane Certainty=0. 2126 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC31628 GB:U46902 ScrR [Streptococcus mutans] 
Identities = 225/320 (70%) , Positives = 273/320 (85%) 

Query: 1 IWAKLTDVAAIAGVSPTOTSRVINKKGYLSQKTVTKVNFAMRTLGYKPNNLARSLQGKSA 60 

MVAKLTDVA LAGVSPTTVSRVIN+KGYLS+KT+TKV AM+TLGYKPNNLARSLQGKSA 
Sbjct: 1 MVAKLTDVAKLAGVSPTWSRVINRKGYLSEKTITKVQAAMKTLGYKPNNLARSLQGKSA 60 

Query: 61 KLIGLIFPNIRNIFYAELIEHLEIELFKHGYKTILCNSEKDPIKEKEYLEMLGANQVDGI 120 

KLIGLIFPNI +IFY+ELIE+LEIELFKHGYK I+CNS+ +P KE++YLEML ANQVDGI 
Sbjct: 61 KLIGLIFPNISHIFYSELIEYLEIELFKHGYKAIICNSQNNPDKERDYLEMLEANQVDGI 120 

Query: 121 ISSSHNLGIDDYEKVEAPIVAFDRNLAPHIPIVSSDNFFGGKMAAQTLKKHGCQKMIMIT 180 

ISSSHNLGIDDYEKV API+AFDRNLAP+IPIVSSDNF GG+MAA+ LKKHGCQ IMI 
Sbjct: 121 I SS SHNLGI DDYEKVSAP I IAFDRNLAPNI PIVSSDNFEGGRMAAKLLKKHGCQHPIMI A 180 

Query: 181 GNDNSDSPTGLRRLGFSYESKESICVITVTNGLSNMRREMELKSIISTHKPDGIFTSDDLT 240 

G DNS+SPT LR+LGF ++ + ++ LS +R+EME+K 1+ KPDGIF SDD+T 

Sbjct: 181 GKDNSNSPTALRQLGFKSVFAQAPIFHLSGELSIIRKEMEIKVILQNEKPDGIFLSDDMT 240 

Query: 241 ALLVIKLISQLGLSIPEDIKVIGYDGTSFIQDYVPHLTTIKQPIREIAQLMVEILLAKIE 300 

A+L +K+ +QL ++IP ++K+IGYDGT F+++Y P+LTTI+QPI++IA L+V+ILL KI+ 
Sbjct: 241 AILTMKIANQIiNITIPHELKIIGYDGTHFVENYYPYLTTIRQPIKDIAHLLVDILLKKID 300 

Query: 301 GQKTNKDYILPVSLIPGSSV 320 

Q KDYILPV L+ G SV 
Sbjct: 301 HQDIPKDYILPVGLLSGESV 320 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 3597> which encodes the amino acid 
sequence <SEQ ID 3598>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:AAC31628 GB:U46902 ScrR [Streptococcus mutans] 
Identities = 226/321 (70%) , Positives = 269/321 (83%) , Gaps = 1/321 (0%) 

15 Query: 1 vVAKLTDVAALAGVSPTWSRVINKKGYLSQKTTOKVNKT^RELGYKPNNLARSLCGKST 60 

+VAKLTDVA LAGVSPTTVSRVTN+KGYLS+KT+ KV AM+ LGYKPNNLARSLQGKS 
Sbjct: 1 WAKLTDVAKLAGVSPTWSRVINRKGYLSEKTITKVC^yiKTLGYKPNNLARSLQGKSA 60 

Query: 61 QLIGLIFPNISNIFYAELIEHLEIELFKQGYKTIICNSEHNPVKEREYLEMLAANQVDGI 120 
20 +LIGLIFPNIS+IFY+ELIE+LEIELFK GYK IICNS++NP KER+YLEML ANQVDGI 

Sbjct: 61 KLIGLIFPNISHIFYSELIEYLEIELFKHGYKA.IICNSQNNPDKERDYLEMLEANQVDGI 120 

Query: 121 ISSSHNLGIEDYERVEAPIVAFDRNrAPNIPVISSDNFEGGKIAAQTLQKHGCQNIVMIT 180 
ISSSHNLGI+DYE+V API+AFDRNLAPNIP++SSDNFEGG++AA+ L+KHGCQ+ +MI 
25 Sbjct: 121 ISSSHNLGIDDYEKVSAPIIAFDRNLAPNIPIVSSDNFEGGRMAAKLLKKHGCQHPIMIA 180 

Query: 181 GNDNSDS PTGLRQLGFNYQLKRSAEI I KLPNNLSPVRREME I KS I LATRKPDGLFVSDDL 240 

G DNS+SPT LRQLGF + AIL LS +R+EMEIK IL KPDG+F+SDD+ 
Sbjct: 181 GKDNSNSPTALRQLGFK-SVFAQAPIFHLSGELSIIRKEMEIKVILQNEKPDGIFLSDDM 239 

30 

Query: 241 TAILIMKVAKQLHITIPEDMKVIGYDGTTFIQQYVPQLATIRQPIDEIAKLSVEILIKKI 300 

TAIL MK+A QL+ITIP ++K+IGYDGT F++ Y P L TIRQPI +IA L V+IL+KKI 
Sbjct: 240 TAILTMKIANQLNITIPHELKIIGYDGTHFVENYYPYLTTIRQPIKDIAHLLVDILLKKI 299 

35 Query: 301 KKEKTSKDYILPITLLPGASI 321 

+ KDYILP+ LL G S+ 
Sbjct: 300 DHQDIPKDYILPVGLLSGESV 320 

An alignment of the GAS and GBS proteins is shown below. 

40 Identities = 247/321 (76%) , Positives = 293/321 (90%) , Gaps = 1/321 (0%) 

Query: 1 IWAKLTDVAAI^GVSPTTVSRVINKKGYLSQKTOTKVNEAMRTLGYKPNNLARSLQGKSA 60 

+VAKLTDVAALAGVSPTTVSRVINKKGYLSQKTV KVN+AMR LGYKPNNLARSLQGKS 
Sbjct: 1 WAKLTDVAAIAGVSPTWSRVINKKGYLSQKTVNKVNKAMRELGYKPNNLARSLQGKST 60 

45 

Query: 61 KLIGLIFPNIRNIFYAELIEHLEIELFKHGYKTILCNSEKDPIKEKEYLEMLGANQVDGI 120 

+LIGLIFPNI NIFYAELIEHLEIELFK GYKTI+CNSE +P+KE+EYLEML ANQVDGI 
Sbjct: 61 QLIGLIFPNISNIFYAELIEHLEIELFKQGYKTIICNSEHNPVKEREYLEMLAANQVDGI 120 

50 Query: 121 ISSSHNLGIDDYEKVEAPIVAFDRNIiAPHIPIVSSDNFFGGKMAAQTLKKHGCQKMIMIT 180 

ISSSHNLGI+DYE+VEAPIVAFDRNLAP+IP++SSDNF GGK+AAQTL+KHGCQ ++MIT 
Sbjct: 121 ISSSHNLGIEDYERVEAPIVAFDRNLAPNIPVISSDNFEGGKLAAQTLQKHGCQNIVMIT 180 

Query: 181 GNDNSDSPTGLRRLGFSYESKES-KVITVTNGLSNMRREMELKSIISTHKPDGIFTSDDL 239 
55 GNDNSDSPTGLR+LGF+Y+ K S ++I + N LS +RREME+KSI++T KPDG+F SDDL 

Sbjct: 181 GNDNSDSPTGLRQLGFNYQLKRSAEIIKLPNNLSPVRREMEIKSILATRKPDGLFVSDDL 240 

Query: 240 TALLVIKLISQLGLSIPEDIKVIGYDGTSFIQDYVPHLTTIKQPIREIAQLMVEILLAKI 299 
TA+L++K+ QL ++IPED+KVIGYDGT+FIQ YVP L TI+QPI EIA+L VEIL+ KI 
60 Sbjct: 241 TAILIMKVAKQLHITIPEDMKVIGYDGTTFIQQYVPQLATIRQPIDEIAKLSVEILIKKI 300 

Query: 300 EGQKTNKDYILPVSLIPGSSV 320 

+ +KT+KDYILP++L+PG+S+ 
Sbjct: 301 KKEKTSKDYILPITLLPGASI 321 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1160 

A DNA sequence (GBSxl236) was identified in S.agalactiae <SEQ ID 3599> which encodes the amino 
acid sequence <SEQ ID 3600>. This protein is predicted to be sucrose-6-phosphate hydrolase (cscA). 
Analysis of this protein sequence reveals the following: 

Possible site: 52 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4775 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Mot Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA35872 GB:X51507 sucrose -6 -phosphate hydrolase [Streptococcus mutans] 
Identities = 303/479 (63%) , Positives = 359/479 (74%) , Gaps = 25/479 (5%) 



Query: 


1 


MmPTEIRYRPYDEWTEEDKENIVKNVSKSPVSRATYHLEAKTGLLNDPNGFSYFNGKFHL 


60 






MNLP IRYR Y +WTEE+ ++I NV+ SPW TYH+E KTGLLNDPNGFSYFNGKF+L 




Sb j ct : 


1 


MNLPQNIRYRRYQDWTEEEIKSIKTNVALSPWHTTYHIEPKTGLIJslDPNGFSYFNGKFNL 


60 


Query: 


61 


FYQNWPFGAAHGLKQWTOTESDDLVHFKETGIKLKPDHVNDSHGAYSGSALAIDDKLFLF 


120 






FYQNWPFGAAHGLK W+HTES +DLVHFKETG L PD +DSHGAYSGSA I D+LFLF 




Sb j ct : 


61 


FYQ^PFGAAHGLK3WIHTESEDLVHFKETGTVLYPDTSHDSHGAYSGSAYEIGDQLFLF 


120 


Query: 


121 


YTGNVRDMKWTODPRQIGAWMTNDGKITKFDK^ 


180 






YTGNVRD W R P QIGA+M G I KF VLI QPNDVTEHFRDPQIFNY QFYA+ 




Sb j ct : 


121 


YTGNVRDFJWVRHPLQIGAFMDKKGNIQKFTDVLIKQPNDVTEHFRDPQIFNYKGQFYAI 


180 


Query: 


181 


IGAQNSKKCGFIK1YKALNNDIHHWEFVGDLDFGGTGSEYMIECPNIIFVKGKPVLLYSP 


240 






+GAQ+ LDFGG+ SEYMIECPN++F+ +PVL+YSP 




Sb j ct : 


181 


VGAQS LDFGGSKSEYMIECPNLVFINEQPVLIYSP 


215 


Query: 


241 


QGLDKNELDYQNIYPNTYKIGQYFDANSSKIVEPSPIYNLDYGFEAYATQGFNTSDGRAF 


300 






QGL K+ELDY NIYPNTYK+ Q FD +V+ S I NLD+GFE YATQ FN DGR + 




Sb j ct : 


216 


QGLSKSELDYHNIYPNTYKVCQSFDTEKPALVDASEIQNLDFGFECYATQAFNAPDGRVY 


275 


Query: 


301 


IVSWIGLPDIDYPSDQFDYQGAMSLVKELSIKNGNLYQYPVPAMKNLRQHQAEFKTQLQT 


360 






VSWIGLPDIDYPSD +DYQGA+SLVKELS+K+G LYQYPV A+++LR + + +T 




Sbjct: 


276 


AVSWIGLPDIDYPSDSYDYQGALSLVKELSLKHGKLYQYPVEAVRSLRSEKEAVTYKPET 


335 


Query: 


361 


NOTYELELLVPRNDLSSFVLFANPKGQGLSITIDTVKGKVIIDRSQAGQQYATEFGTSRQ 


420 






NNTYELEL + ++ +LFA+ KG GL+IT+DT G ++IDRS+AG+QYA EFG+ R 




Sb j ct : 


336 


NNTYELELTFDSSSVNELLLFADNKGNGLAITVDTKMGTILIDRSKAGEQYALEFGSQRS 


395 


Query: 


421 


CDIPKDATSINIFIDKS1FEIFINKGEKVFTGRVFPDAEQSGIQLKEGHVHGKYFELKY 479 






CI T +NI F+DKS I FE I FINKGEKVFTGRVFP+ +Q+GI +K G G Y+ELKY 




Sb j ct : 


396 


CS I QAKETWNI FVDKS I FEI FINKGEKVFTGRVFPNDKQTGI VI KSGKPSGNYYELKY 454 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3601> which encodes the amino acid 
sequence <SEQ ID 3602>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 462 9 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 288/479 (60%) , Positives = 367/479 (76%) 

5 

Query: 1 MNLPTEIRYRPYDEWTEEDKENIVKNVSKSPWRATYHLKAKTGLIiNDPNGFSYENGKFHIi 60 

M+LP IRYRPY EW+ +D + I + +++SPW + +H+E KTGLLNDPNGFSYFNG++HL 
Sbjct: 2 MDLPQAIRYRPYKEWSSKDYQAITEKMAQSPWHSQFHVEPKTGLLNDPNGFSYFNGRYHL 61 

10 Query: 61 FYQNWPFGAAHGLKQWVHTESDDLVHFKETGIKLKPDHVMJSHGAYSGSALAIDDKLFLF 120 

FYQNWP+GAAHGLKQWVH S DLVHF ET +L PDH +DSHGAYSGSA AIDDKLFLF 
Sbjct: 62 FYQNWPYGAAHGLKQWVHMTSTDLVHFTETRSRLLPDHAHDSHGAYSGSAYAIDDKLFLF 121 

Query: 121 YTGNTODMKMRDPRQIGAmTNDGKITKFDK^ISQPNDVTEHFRDPQIFNYDNQFYAV 180 
15 YTGNVRD W R P Q+GAWM G I+K +VLI QP+DVTEHFRDPQ+F+Y QFYA+ 

Sbjct: 122 YTGNVRDANWVRTPLQVGAWMDKQGNISKIPQVLIEQPDDVTEHFRDPQLFSYQGQFYAI 181 

Query: 181 IGAQNSKKCGFIKLYKALNNDIHHWEFVGDLDFGGTGSEYMIECPMIIFVKGKPVLLYSP 240 
IGAQ G IKLYKA++N + +W F+ DLDF +G+EYMIECPN++FV KPVL++SP 

20 Sbjct: 182 IGAQGLDGKGKIKLYKAVDNHVDNWRFIADLDFDDSGTEYMIECPNLVFVDDKPVLIFSP 241 

Query: 241 QGLDKNELDYQN1YPNTYK1GQYFDANSSKIVEPSPIYNLDYGFEAYATQGFNTSDGRAF 300 

QGL K +LDYQNIYPNTYKI + F+ + +++ + NLD+GFEAYATQ F++ DGR 
Sbjct: 242 C<3LAKADLDYQNIYPNTYKIFESFNPETGQIjLGGGALQNLDFGFEAYATQAFSSPDGRVL 301 

25 

Query: 301 IVSWIGLPDIDYPSDQFDYQGANISLVKELSIKNGNLYQYPVPAMKNLRQHQAEFKTQLQT 360 

VSWIGLPDIDYP+D++DYQGA+SLVKEL IK+G LYQ PV A++NLR F ++ + 

Sbjct: 302 AVSWIGLPDIDYPTDRYDYQGALSLVKELRIKDGILYQTPVSALQNLRGPAELFHNKIDS 361 

30 Query: 361 NNTYELELLVPRNDLSSFVLFANPKGQ^LSITIDTVKGKVIIDRSQAGQQYATEFGTSRQ 420 

+N YELEL +P +LFA+ KG GL + +DT KG++ IDRS+AG QYA ++GT R 

Sbjct: 362 SNCYELELTIPGQKKLDLLLFADQKGMGLRLKVDTTKGQLSIDRSRAGVQYAQDYGTVRS 421 

Query: 421 CDIPKDATSINI FIDKS I FE I F INKGEKVFTGRVFPDREQSGI QLKEGHVHGKYFELKY 479 
35 C IP+ ++N+++D SI EIFIN+G+KV T RVFP Q+GIQ+ EG G Y+E++Y 

Sbjct: 422 CQIPQGHVTLNVYVDNSILEIFINQGQKVLTSRVFPTHGQTGIQVVEGQAFGHYYEMRY 480 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

40 Example 1161 

A DNA sequence (GBSxl237) was identified in S.agalactiae <SEQ ID 3603> which encodes the amino 
acid sequence <SEQ ID 3604>. Analysis of this protein sequence reveals the following: 



45 



50 



Possible site: 14 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2204 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1162 

A DNA sequence (GBSxl238) was identified in S.agalactiae <SEQ ID 3605> which encodes the amino 
acid sequence <SEQ ID 3606>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood = 


-7. 


.64 


Transmembrane 


259 


- 275 


( 


250 


- 283) 


INTEGRAL 


Likelihood = 


-4 


.41 


Transmembrane 


113 


- 129 


( 


109 


- 130) 


INTEGRAL 


Likelihood = 


-3 


.03 


Transmembrane 


180 


- 196 


( 


180 


- 196) 


INTEGRAL 


Likelihood = 


-3 


.03 


Transmembrane 


439 


- 455 


( 


438 


- 456) 


INTEGRAL 


Likelihood = 


-2 


.81 


Transmembrane 


298 


- 314 


( 


298 


- 317) 


INTEGRAL 


Likelihood = 


-2. 


.02 


Transmembrane 


396 


- 412 


( 


395 


- 412) 



Final Results 

bacterial membrane Certainty=0 .4057 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC99320 GB:AF059741 sucrose-specific PTS permease [Clostridium 
beijerinckii] 

Identities = 235/453 (51%) , Positives = 312/453 (67%) , Gaps = 15/453 (3%) 

Query: 7 IAKQVINAIGGASNTOSVAHCATRLRVMVKDEWIDKNTVENIEKVQGAFFNSGQYQIIF 66 

+AK+++ IGG N++SV HCATRLR+++ D+ I++ +ENI+ V+G FF++ QYQII 
Sbjct: 6 VAKEILENIGGKENIKSVEHCATRLRLILNDKEKINEKAIENIDGVKGQFFSAAQYQIIL 65 

Query: 67 GTGTVNKIYDEWAQGLPTSSTSDQKAEAAKQGNAFQRAIRTFGDVFVPLLPAIVATGLF 126 

GTG VN++YD +V Q T + K EA Q Q+ RTFGD VFVP+ + P +VATGLF 

Sbjct: 66 GTGFVNE VYDVI VGQNSDLV-TGNNKEEAYSQMTLIQKI SRTFGDVFVPI I PVLVATGLF 124 

Query: 127 MGIRGAINITOTVIjALFGTTSKAFSSSNFYTYTVVIiTDTAFAFFPALISWSAFRVFGGNPV 186 

MG+RG + N V + NF +T VLTDTAFAF PAL++WS + FGG PV 

Sbjct: 125 MGLRGLLTNLGVQM NENFVLFTQVLTDTAFAFLPALVAWSTMKKFGGTPV 174 

Query: 187 IGLVLGLMMVNSALPNAWAVASGDAHPI KF - - FGF - 1 P WGYQNSVLPAFFVGLLGAKLE 243 

IG+V+GLM+V+ +LPNA+AVA+G A PI G IPWGYQ SVLPA +G++ AK + 

Sbjct: 175 IGIVIGLMLVSPSLPNAYAVAAGTATPINLTILGLNIPWGYQGSVLPALVLGIIAAKTQ 234 

Query: 244 KWLHKKIPDVLDLLLVPFLTFTVMSILALFVIGPIFHSVFJSTYVIjAGTKFVIjNLPLGLSGL 303 

K L K +PDVLDL++ PF+T +L L ++GPI H+ E + K + LP GL GL 

Sbjct: 235 KALKKWPDVLDLIVTPFITLLFSMVLGLLIVGPIMHNAEQLIFGAIKGFMGLPFGLGGL 294 

Query: 304 ILGGVHQI IVWGVHHIFNLLFAQLIAADGKDPFNAIITAAMTAQAGATIjAVGVKTKNKK 363 

++GGVHQ+IWTGVHH N LE +L+++ GKD FNA+IT + AQ A LAV VKTK+KK 
Sbjct: 295 VVGGVHQLIVVTGVHHALNALEVELLSSTGKDAFNAMITCGIVAQGAAALAVAVKTKDKK 354 

Query: 364 LKALAFPAALSAGLGITEPAIFGVNLRFGKPFIMGLIAGAAGGWLASILKLAGTGFGITI 423 

++L +A+ A LGITEPAI FGVNLRF KPFI G GA GG L+ IL LAGTG GIT 
Sbjct: 355 KRSLYISSAIPAFLGITEPAIFGVNLRFIKPFIFGCAGGAVGGMLSGILHLAGTGMGITA 414 

Query: 424 IPGTLLYLNGQIVKYLIMVIGTTSLAFVLTYMF 456 

+PG LLY+N + Y+++ + ++AF LT F 
Sbjct: 415 LPGMLLYVN-NLGSYILVNWAIAVAFCLTLFF 446 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3607> which encodes the amino acid 
sequence <SEQ ID 3608>. Analysis of this protein sequence reveals the following: 

Possible site: 26 
>» Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood = 


-4. 


.99 


Transmembrane 


111 


- 127 


( 108 


- 129) 


INTEGRAL 


Likelihood = 


-4. 


.57 


Transmembrane 


176 


- 192 


( 176 


- 193) 


INTEGRAL 


Likelihood = 


-4, 


,35 


Transmembrane 


436 


- 452 


( 431 


- 453) 


INTEGRAL 


Likelihood = 


-3. 


.88 


Transmembrane 


295 


- 311 


( 293 


- 314) 


INTEGRAL 


Likelihood = 


-3. 


.50 


Transmembrane 


259 


- 275 


( 253 


- 277) 
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INTEGRAL Likelihood = -2.07 Transmembrane 405 - 421 { 405 - 421) 
INTEGRAL Likelihood = -0.43 Transmembrane 219 - 235 ( 219 - 235) 



Final Results 

5 bacterial membrane Certainty=0. 2996 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

10 >GP:AAC99320 GB:AF059741 sucrose-specific PTS permease [Clostridium 

beijerinckii] 

Identities = 234/451 (51%) , Positives = 312/451 (68%) , Gaps = 11/451 (2%) 

Query: 1 MDNRQIAAEVIFJU^GGRENWSVAHCATRI^vMvYDEGKIDKEKAEAIDKVKGAFFNSGQ 60 
15 M + +A E++E +GG+EN++SV HCATRLR+++ D+ KI+++ E ID VKG FF++ Q 

Sbjct: 1 MKEQIVAKEILENIGGKENIKSVEHCATRLRLILNDKEKINEKAIENIDGVKGQFFSAAQ 60 

Query: 61 YQMI FGTGTVNNI YDEWALGLPTSSTSEQKAEAGKHGNI FQRAIRTFGDVFVPI I PAIV 120 
YQ+I GTG VN +YD +V T K EA + Q+ RTFGDVFVPIIP +V 

20 Sbjct: 61 YQIILGTGFVNEVYDVIVGQNSDLV-TGNNKEEAYSQMTLIQKISRTFGDVFVPIIPVLV 119 

Query: 121 ATGLFMGVRGLVTQPAIMDLFGVHEYGENFLMYTRILTDTAFVYLPALVAWSAFRVFGGN 180 

ATGLFMG+RGL+T + + ENF+++T++LTDTAF +LPALVAWS + FGG 

Sbjct: 120 ATGLFMGLRGLLTNLGV QMNENFVLFTQVLTDTAFAFLPALVAWSTMKKFGGT 172 

25 

Query: 181 PIIGIvLGLMLVSNELPNAWWASGGDVK-PLTFFGF-VPWGYQGTVLPAFFVGLVGAK 238 

P+IGIV+GLMLVS LPNA+ VA+G LT G +PWGYQG+VLPA +G++ AK 

Sbjct: 173 PVIGIVIGLMLVSPSLPNAYAVAAGTATPINLTILGLNIPWGYQGSVLPALVLGIIAAK 232 

30 Query: 239 LEKOTiHKKVPEALDLLOTPFLTFAIMSTLGLFVIGPVFHSLENLVIAGTQAvIiHLPFGIA 298 

+K L K VP+ LDL+VTPF+T LGL ++GP+ H+ E L+ + + LPFG+ 

Sbjct: 233 TQKALKKWPDVLDLI VTPFITLLFSMVLGLLIVGPIMHNAEQLIFGAIKGFMGLPFGLG 292 

Query: 299 GLIVGGIQQLIvvTGIHHIFNFLEAQLIANTGKDPFNAYLTAATAAOJW^^ 358 
35 GL+VGG+ QLIWTG+HH N LE +L+++TGKD FNA +T AQ A LAVAVKTK 

Sbjct: 293 GLWGGVHQLIVVTGVHHALNALEVELLSSTGKDAFNAMITCGIVAQGAAALAVAVKT^ 352 

Query: 359 TKLKGLAFPSTLSALLGITEPAIFGVNLRYPKVFVSGLIGGALGGWVAGLFGIAGTGFGI 418 
K + L S + A LGITEPAIFGVNLR+ K F+ G GGA+GG ++G+ +AGTG GI 
40 Sbjct: 353 KKKRSLYISSAIPAFLGITEPAIFGVNLRFIKPFIFGCAGGAVGGMLSGILHLAGTGMGI 412 

Query: 419 TVLPGTLLYLNGQLLQYLVTMLVGLGVAFAI 449 

T LPG LLY+N L Y++ +V + VAF + 
Sbjct: 413 TALPGMLLYVN-NLGSYILVNWAIAVAFCL 442 

45 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 409/618 (66%) , Positives = 491/618 (79%) , Gaps = 12/618 (1%) 

Query: 4 NTEIAKQVINAIGGASNVRSVAHCATRLRVMVKDETVIDKNTVENIEKVQGAFFNSGQYQ 63 
50 N +IA +VI A+GG NVRSVAHCATRLRVMV DE IDK E I+KV+GAFFNSGQYQ 

Sbjct: 3 NRQIAAEVIEALGGRENVRSVAHCATRLRVMvYDEGKIDKEKAEAIDKVKGAFFNSGQYQ 62 

Query: 64 IIFGTGTVNKIYDEWAQGLPTSSTSDQKAEAAKQGNAFQRAIRTFGDVFVPLLPAIVAT 123 
+IFGTGTVN IYDEWA GLPTSSTS+QKAEA K GN FQRAIRTFGDVFVP++PAIVAT 
55 Sbjct: 63 MIFGTGTVNNIYDEWALGLPTSSTSEQKAEAGKHGNIFQRAIRTFGDVFVPIIPAIVAT 122 

Query: 124 GLFMGIRGAINNDTVLALFGTTSKAFSSSNFYTYTVVLTDTAFAFFPALISWSAFRVFGG 183 

GLFMG+RG + ++ LFG NF YT +LTDTAF + PAL++WSAFRVFGG 

Sbjct: 123 GLFMGVRGLVTQPAIMDLFGVHEYG ENFLMYTRILTDTAFVYLPALVAWSAFRVFGG 179 



60 



Query: 184 NPVIGLVLGLMMVNSALPNAWAVASG-DAHPIKFFGFIPWGYQNSVLPAFFVGLLGAKL 242 

NP+IG+VLGLM+V++ LPNAW VASG D P+ FFGF+PWGYQ +VLPAFFVGL+GAKL 
Sbjct: 180 NPIIGIVLGLMLVSNELPNAWWASGGDVKPLTFFGWPWGYQGTvLPAFFVGLVGAKL 239 



65 



Query: 



243 EKWLHKKI PD VLDLLLVPFLTFTvMS ILALFVIGPI FHSVENYVLAGTKFVLNLPLGLSG 302 
EKWLHKK+P+ LDLL+ PFLTF +MS L LFVIGP+FHS +EN VLAGT+ VL+LP G++G 
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40 



55 



60 
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Sbjct: 240 EKWLHKKVPEALDLLVTPFLTFAIMSTLGLFVIGPVPHSLENLVLAGTQAVLHLPFGIAG 299 

LILGGWQIIVVTGVHHIFNLLEAQLIAaDGKDPFNAIITAAMTAQAGATIAVGVKTKNK 362 
LI+GG+ Q+IWTG+HHIFN LEAQLIA GKDPFNA +TAA AQAGATLAV VKTK+ 
LIVGGIQQLIVVTGIHHIFNFLiEAQLIANTGKDPFNAYLTAATAAQAGATLAVAVKTKST 359 

KLKALAFPAALSAGLGITEPAIFGVNLRFGKPFIMGLIAGAAGGWLASILKLAGTGFGIT 422 
KLK IAFP+ LSA LGITEPAI FGVNLR+ K F+ GLI GA GGW+A + +AGTGFGIT 
KLKGLAFPSTLSALLGITEPAIFGVNLRYPKVFVSGLIGGALGGWVAGLFGIAGTGFGIT 419 

IIPGTLLYIaNGQIVKYLIMVIGTTSLAFVLTYMFGYEDKDEKAVAEVSPLVEETDDDPTI 482 
++PGTLLYLNGQ+++YL+ ++ +AF + Y +GY+D++ + V V++T D P + 



ET+ SPL+G V+ L VSDPVFSSG MG GLAIKP NT+YSPVDG V+I F 
-ETLYSPLNGTWDLSAVSDPVFSSGAMGQGLAIKPEDNTLYSPVDGKVEIVF 531 





Sb j ct : 


240 




Query: 


303 


5 


Sb j ct : 


300 




Query: 


363 


10 


Sb j ct : 


360 




Query: 


423 




Sbj ct : 


420 


15 


Query: 


483 




Sbj ct : 


478 


20 


Query: 


543 




Sbj ct : 


532 




Query: 


603 


25 


Sbj ct : 


592 



ETGHAY I S +GAE+L+HIGIDT +M G GF S V Q VKKGD+LG FD +KIAEAG 
ETGHAYAITSSQGAEVLLHIGIDTESMAGDGFESLVAVGQAVKKGDLLGHFDPSKIAEAG 591 

LDNTAMI IVTNTADFADV 620 
LD+T M+IV+N AD+ V 
LDDTTMMIVSNIADYQSV 609 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1163 

30 A DNA sequence (GBSxl239) was identified in S.agalactiae <SEQ ID 3609> which encodes the amino 
acid sequence <SEQ ID 3610>. This protein is predicted to be fructokinase. Analysis of this protein 
sequence reveals the following: 



Possible site: 18 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 2436 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA02467 GB:D13175 fructokinase [Streptococcus mutans] 
Identities = 232/291 (79%) , Positives = 257/291 (87%) 

45 Query: 1 MTKLYGSIFJVGGTKFVCAVGDEELKVVEKMQFPTTTPQETIKKTVDFFKRFEKKLEAVAI 60 

M+KLYGSIEAGGTKFVCAVGDE +++EK+QFPTTTP ETI+KTV FFK+FE L +VAI 
MSKLYGS IFAGGTKFVCAVGDENFQILEJCVrjFPTTTPYETIEKTVAFFKKFEADLASVAI 6 0 

GSFGPIDIDKKSKTYGYITTTPKLHWflNVDIiLGLISKDFNVPFYFTTDVNSSAYGEVIAR 120 
50 GSFGPIDID+ S TYGYIT+TPK +WANVD +GLISKDF +PFYFTTDVNSSAYGE IAR 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 


Query: 


241 



+N+ SLVYYTTGTGIGAGAIQ GEFIGG GHTEAGH YMA HP D + F G CPFH C 



LEGLA+GP+LEARTGIRGELIE+NS VWD+QAYYIAQAAIQATVLYRPQVIVFGGGVMAQ 
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EHML RVR+ F +LLN YLPVPD+ DYIVTPA+ ENGSATLGN ALAKKI+ 
Sbjct: 241 EHMimWEKFTSLLNDYLPVPDVKDYIVTPAVftENGSATLGNLALAKKIA 291 

A related DNA sequence was identified in S.pyogenes <SEQ ID 361 1> which encodes the amino acid 
sequence <SEQ ID 3612>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2012 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 212/293 (72%) , Positives = 246/293 (83%) 



Query: 


1 


MTKLYGSIEAGGTKFVCAVGDEELKOTEKMQFPTTTPQETIKKTVDFFKRFEKKLEAVAI 


60 






M KLYGS I EAGGTKFVCAVGDEE W+K QFPTTTP+ETI +T+ +FK FE L +AI 




Sbj ct : 


1 


MGKLYGSIEAGGTKFVCAVGDEEFTVVDKTQFPTTTPEETIARTIAYFKAFEADIAGMAI 


60 


Query: 


61 


GSFGPIDIDKKSKTYGYITTTPKLHWANVDLLGLISKDFNVPFYFTTDVNSSAYGEVIAR 


120 






GSFGPIDID S+TYGYITTTPK WANVDLLG +S F +PF TTDVNSSAYGEV+AR 




Sbj ct : 


61 


GSFGPIDIDPSSETYGYITTTPKSGWANVDIiLGQLSAAFKIPFDVTTDVNSSAYGEVLAR 


120 


Query: 


121 


NNIDSLVYYTIGTGIGAGAIQKGEFIGGTGHTEftGHTYMAMHPQDQANDFKGICPFHNSC 


180 






++SLVYYTIGTGIGAGAIQ G FIGG GHTEAGHTY+ HP D A F G+CPFH C 




Sbj ct : 


121 


PGVESLVYYTIGTGIGAGAIQHGHFIGGLGHTEAGHTYVMPHPDDMAKGFLGVCPFHKGC 


180 


Query: 


181 


LEGIASGPTLEARTGIRGELIEENSMvTOVQAYYIAQAAIQATVLYRPQVIVFGGGVMAQ 


240 






LEG+A+GP++EARTG+RGE +++ + VWD+QA+YIAQAA+QAT+LYRPQVI VFGGGVMAQ 




Sbj ct : 


181 


LEGMAAGPSIEARTGWGERLDQEADVTOICi^IAQRALQATOLYR 


240 


Query: 


241 


EHMLRRVRQTFATLLNGYLPVPDLSDYI VTPAIEENGSATLGNFALAKKI S KG 293 








EHM+ RV F LL+GYLPVPDL+DYIVTPA+ +NGSATLGNFALAK ++G 




Sbj ct : 


241 


EHMVLRVHDKFTALLSGYLPVPDLTDYIOTPAVM3NGSATLGNFALAKLAAQG 293 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1164 

A DNA sequence (GBSxl240) was identified in S.agalactiae <SEQ ID 3613> which encodes the amino 
acid sequence <SEQ ID 3614>. This protein is predicted to be Mannosephosphate Isomerase (pmi). 
Analysis of this protein sequence reveals the following: 

Possible site: 50 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4717 (Affirmative) < suco 

bacterial membrane Certainty^O . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA04021 GB:D16594 Mannosephosphate Isomerase [Streptococcus mutans] 
Identities = 232/312 (74%) , Positives = 262/312 (83%) 

Query: 1 mSEPLFLEASMHDKIWGGTKLRDEFGYDIPSETTGEYWAISAHPNGVSRVKNGRFKGCFIj 60 

M PLFL++ MH KIWGG +LR EFGYDIPSETTGEYWAISAHPNGVS VKNG +KG L 
Sbjct: 1 MEGPLFLQSQMHKKIWGGNRLRKEFGYDIPSETTGEYWAISAHPNGVSvVKNGVYKGVPL 60 
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Query: 


61 


DKLYQGEKSLFGNPDDTVFPLLTKILDANDWLSVQVHPDDAYALKHEGELGKTECWYIIS 


120 






D+Jj 1 + J_ir \jN +ve FJjIj 1 1S.X JjDAIM.UW.ufc> Vy VllFLJ+ii i.H±j+lliioJiJJjl\.l iik,W x + J. fc> 




Sb j ct : 


61 


DELYAEHRELFGNSKSSVFPLLTKILDflNDWLSVQVHPDNAYALEHEGELGKTECWYVIS 


120 


Query: 


121 


ADEGSEIIYGHNAKTKEELRQMIESGDWEHLLTRIPVKSGDFYYVPSGTMHAIGKGILIL 


180 










Sbjct: 


121 


ADEGAEIIYGHEAKSKEELRQMIA^^ 


180 


Query: 


181 


ETQQSSDTTYRWDFDRPDASGKLRDLHI^ 


240 






iilWWfc»fc>iJl J- iKVxUriJK JJ t3+ K ijn±liyfc)XlJVJjl J.oJSJt'/iW trA + Jj+ Jj + 1+1jV 




Sb j Ct : 


181 


ETQQSSDTTYRVYDFDRKDDQGRKRALHIEQSIDVLTIGKPANATPAWLSLQGLETTVLV 


240 


Query: 


241 


SMDFFTOTKWEISGVTOFKQFAPYLLVSVLDGAGHITVDNKOTTLKKGDHFILP1TOWKW 


300 






S+ FFTVYKW+ 1 SG +Q APYLLVSVL G G ITV + Y L+KGDH ILPN + W 




Sb j ct : 


241 


SSPFFTOTKWQISGSVKMQQTAPYLLVSVLAGQGRITVGLEQYALRKGDHLILPNTIKSW 


300 


Query: 


301 


DIDGQLEI IASH 312 








DG LEI IASH 




Sb j ct : 


301 


QFDGDLEI IASH 312 





A related DNA sequence was identified in S.pyogenes <SEQ ID 361 5> which encodes the amino acid 
sequence <SEQ ID 3616>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3714 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 232/312 (74%) , Positives = 264/312 (84%) 



Query: 


1 


MSEPLFLEASMHDKIWGGTKLRDEFGYDIPSETTGEYWAISAHPNGVSRVKNGRFKGCFL 


60 






MSEPLFL+++MHD+IWGGTKLRD F Y+IPS+TTGEYWAISAHPNGVS V NGR++G L 




Sb j ct : 


1 


MSEPLFLKSTMHDRIWGGTKLRDVFAYNIPSDTTGEYWAISAHPNGVSTVTNGRYQGQPL 


60 


Query: 


61 


DKLYC^EKSLFGNPDDTVFPLLTKILDANDWLSVQVHPDDAYALKHEGELGKTECWYIIS 


120 






+ LY E +LFGNP + VFPLLTKILDANDWLSVQVHPDDAY +HEGELGKTECWYI IS 




Sb j ct : 


61 


NTLYAQEPALFGNPKEEVFPLLTKILDANDWLSVQVHPDDAYGREHEGELGKTECWYIIS 


120 


Query: 


121 


ADEGSEIIYGHNAKTKEELRQMIESGDWEHLLTRIPVKSGDFYYVPSGTMHAIGKGILIL 


180 






A+EGSE I +YGH AK+KE+LR MIE+G W+ LLTR+PVK+GDF+YVPSGTMHAIGKGILIL 




Sb j ct : 


121 


AEEGSEIVYGHQAKSKEDLRAMIEAGAWDDLLTRVPVKAGDFFYVPSGTMHAIGKGILIL 


180 


Query: 


181 


ETQQSSDTTYRVYDFDRPDASGKLRDLHIEQSIDVLTIGKPANTVPANMKLKHLSSTLLV 


240 






ETQQSSDTTYRVYDFDR D +G LRDLHIE+SIDVLTIGKP N+VPA M L ++ +T LV 




Sb j ct : 


181 


ETQQSSDTTYRVYDFDRKDVNGNLRDLHIEKSIDVLTIGKPENSVPATMVLDNMVATTLV 


240 


Query: 


241 


SNDFFTWKWEISGVTNFKQFAPYLLVSVLDGAGHITVDNKVYTLKKGDHFILPNDW 


300 






S FFTVYKW S + + KQ APYLLVSVL G G + VD K Y L+KG HFILPNDV W 




Sb j ct : 


241 


STPFFTWKWOTSQMVDMKQAAPYLLVSVLKGQGKLYVDQKAYELEKGMHFILPNDVKSW 


300 


Query: 


301 


DIDGQLEI IASH 312 








DGQLE+I SH 




Sb j ct : 


301 


SFDGQLEMIVSH 312 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1165 

A DNA sequence (GBSxl241) was identified in S.agalactiae <SEQ ID 3617> which encodes the amino 
acid sequence <SEQ ID 3618>. This protein is predicted to be preprotein translocase seca subunit (secA). 
Analysis of this protein sequence reveals the following: 

Possible site: 53 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1102 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10107> which encodes amino acid sequence <SEQ ID 
101 08> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA50286 GB:L32090 secA [Listeria monocytogenes] 
Identities = 503/843 (59%) , Positives = 643/843 (75%) , Gaps = 16/843 (1%) 



Query: 


11 


MANILRTOIENDKGELKKLDKIAKKVDSYADHMAALSDEALQAKTPEFKERYQNGETLDQ 


70 




MA +L+ + E+ K ++K L++ A ++ + AD AALSD+AL+ KT EFKER Q GETLD 




oD^ct : 


± 




b U 


Query: 


71 


LLPEAFAVVREASKRVLGLYPYHVQIMGGIVLHHGDIPEMRTGEGKTLTATMPVYLMAIS 


130 






LL EAFAV RE +KR LGLYP+ VQ+MGGIVLH +1 EM+ TGEGKTLTAT + PVYLNA+ S 




OJJJ CC . 


D ± 


T T TnrST?ST77iOtr/^7iin3ST f'T VDT7 WfYT MrTT^fT TJTTn'NTT A UMTf l TPT?r , Tf TT I" 1 A IT WrVT MET C 
IjJLj VciAr.Av.rtKii(J/ilAJ</^LxLi a iwyiJrlvavjX Vi-irliilJl\ ±/^lMlS.10Jil3l\.lijliil J_ilr V iJJ.v/Ujo 


i on 


Query: 


131 


GLGVHVITVNEYLSTRDATEMGEWSV^GLSVGINLAAKSPFEKREAYNCDITYSTNAEV 


190 






G GVHV+TVNEYL+ RDA EMG +Y++LGLSVG+NL A S EKREAY CDITYSTN E+ 




oDJ CC . 




/rcVI TiX\ T\ 7*n 7"MTPVT AUD'TlA'n'TT'MmTT VMirr PT GTZ^T KTT ATAT C GT'TTTrDT? AVA r^Tl T TV G TTvTNTT? T 


ion 


Query: 


191 


nRnvT.T?r)Mvr\n?Rripr)M\7'n'R PTATvaTAn^p\7T^^n,TnT?&PTPT.TvcinpvPQT^ 

uTUI UlvUlNl'l v V ixyXLUl'l V ^I\.ir J-llN Ifiu Vi_/JC> VUO V UJ.UEtnIs.1 c Li X V Dv7ir V O O JirJl\^XJ X L jXtiXJvi 


250 






GFDYLRDNMW +E+MVQRPL +A++DEVDS+L+DEARTPLI+SG + + LY RA+ 




Sb j ct : 


181 


GFDYLRDNMWYKEEMVQRPLAFAVIDEVDSILVDEARTPLIISGE-AEKSTILYVRANT 


239 


Query: 


251 


FVKTL-NSDDYIIDVPTKTIGLSDTGIDKAENYFHIMSILYDLENVALTHYIDNALRANYI 


309 






FV+TL +DY +D+ TK++ L++ G+ K ENYF + NL+DLEN + H+I AL+ANY 




Sb j ct : 


240 


FVRTLTEEEDYTVDIKTKSVQLTEDGMTKGENYFDVENLFDLENTVILHHIAQALKANYT 


299 


Query: 


310 


MLLNIDYWSEEQEILIVDQFTGRTMEGRRFSDGLHQAIEAKESVPIQEESKTSASITYQ 


369 






M L++DYW ++ E+LIVDQFTGR M+GRRFS+GLHQA+EAKE V IQ ESKT A+IT+Q 




Sb j ct : 


300 


MSLDVDYW-QDDEVLIVDQFTGRIMKGRRFSEGLHQALEAKEGVTIQNESKTMATITFQ 


358 


Query: 


370 


NMFRMYHKLAGMTGTGKTEEEEFREIYNMRVIPIPTNRPVQRIDHSDLLYPTLDSKFRAV 


429 






N FRMY KLAGMTGT KTEEEEFR+ IYNMRVI IPTN+ + R D DL+Y T+++KF AV 




Sb j ct : 


359 


NYFRMYKKIiAGMTGTAICrEEEEFRDIYNMRVIEIPTNKVIIRDDRPDLIYTTMEAKFNAV 


418 


Query: 


430 


VADVKERYEQGQPVLVGTVAVETSDLISRKLVAAGVPHEVLNAKNHFKEAQIIMNAGQRG 


489 






V D+ ER+ +GQPVLVGTVA+ +LIS KL G+ H+VLNAK H +EA II +AG+RG 




Sb j ct : 


419 


VED IAERHAKGQPVLVGTVAMNI - ELI SSKLKRKGI KHDVLNAKQHEREAD 1 1 KHAGERG 


477 


Query: 


490 


AVTIATNMAGRGTDIKLGEGVRELGGLCVIGTERHESRRIDNQLRGRSGRQGDPGESQFY 


549 






AV IATNMAGRGTDIKLGEG E GGL VIGTERHESRRIDNQLRGRSGRQGDPG +QFY 




Sb j ct : 


478 


AWIATNMAGRGTDIKLGEGTIEAGGLAVIGTERHESRRIDNQLRGRSGRQGDPGVTQFY 


537 


Query: 


550 


LSLEDDLMRRFGTDRI KWLERMNLAEDDTVI KSKMLTRQVESAQRRVEGNNYDTRKQVL 


609 






LS+ED+LMRRFG+D +K ++ER +AED I+SKM++R VESAQRRVEGNN+D+RKQVL 




Sb j ct : 


538 


LSMEDELMRRFGSDNMKSMMERFGMAED - -AIQSKMVSRAVESAQRRVEGNNFDSRKQVL 


595 


Query: 


610 


QYDDVMREQREIIYANRREVITAERDLGPELKGMIKRTIKRAVDAHSRSDKNTAA- - -EA 


666 






QYDDV+R+QRE+IY R EVI AE L ++ MI+RT+ V +++ S + A + 




Sb j ct : 


596 


QYDDVLRQQREVIYKQRYEVINAENSLREIIEQMIQRTVNFIVSSNASSHEPEEAWNLQG 


655 
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Query: 667 IWFARSALLDEEAITVSELRGLKEAEIKELLYERALAVYEQQIAKLKDPEAIIEFQKVL 726 

+ LL E IT+ +L+ +1+ L+ ++ A Y+++ L PE EF+KV+ 

Sbjct: 656 IIDYVDANLLPEGTITLEDLQNRTSEDIQNLILDKIKaAYDEK-ETLLPPEEFNEFEKW 714 

5 Query: 727 ILMWDNQWTEHIDALDQLRNSVGLRGYAQNNPIVEYQSEGFRMFQDMIGSIEFDVTRTL 786 

+L WD +W +HIDA+D LR+ + LR Y Q +P+ EYQSEGF MF+ M+ SI+ DV R + 
Sbjct: 715 LLRWDTKWVDHIDAMDHLRDGIHLRAYGQIDPLREYQSEGFEMFEAMVSSIDEDVARYI 774 

Query: 787 MKAQIHEQ - ERER - ASQHATTTAEQNI SAQHVPMNNES PEYQGI KRNDKCPCGSGMKFKN 844 
10 MKA+I + ERE+ A A AE A+ P+ + Q I RND CPCGSG K+KN 

Sbjct: 775 MKAEIRQNLEREQVAKGEAINPAEGKPEAKRQPIRKD QHIGRNDPCPCGSGKKYKN 830 

Query: 845 CHG 847 
CHG 

15 Sbjct: 831 CHG 833 

A related DNA sequence was identified in S. pyogenes <SEQ ID 361 9> which encodes the amino acid 

sequence <SEQ ID 3620>. Analysis of this protein sequence reveals the following: 

Possible site: 43 
20 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 4443 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 710/837 (84%) , Positives = 777/837 (92%) , Gaps = 3/837 (0%) 

30 Query: 11 MANILRWIENDKGELKKLDKIAKKVDSYADHMAALSDEALQAKTPEFKERYQNGETLDQ 70 

MANILR VIENDKGEL+KL+KIAKKV+SYAD MA+LSD LQ KT EFKERYQ GETL+Q 
Sbjct: 1 MANIIiRKVIE!©KGELRKLEKIAKKVESYADQMASLSDRDLQGKTLEFKERYQKGETLEQ 60 

Query: 71 LLPEAFAWREASKRVLGLYPYHVQIMGGIVLHHGDIPEMRTGEGKTLTATMPVYLNAIS 130 
35 LLPEAFAWREA+KRVLGL+PY VQIMGGIVLH+GD+PEMRTGEGKTLTATMPVYLNAI + 

Sbjct: 61 LLPEAFAVTOEAAKRVLGLFPYRVQIMGGIVLHNGDVPEMRTGEGKTLTATMPVYLNAIA 120 

Query: 131 GLGVHVITVNEYLSTRDATEMGEVYSWLGLSVGINLAAKSPFEKREAYNCDITYS'TNAEV 190 
G GVHVITVNEYLSTRDATEMGEVYSWLGLSVGINLAAKSP EKREAYNCDITYSTN+EV 
40 Sbjct: 121 GEGVHVITVNEYLSTRDATEMGEVYSVnjGLSVGINIiAAKSPAEKREAYNCDITYSTNSEV 180 

Query: 191 GFDYLRDNMVWQEDMVQRPLNYALVDEVDSVLIDEARTPLIVSGPVSSEMNQLYTRADM 250 

GFDYLRDNMWRQEDMVQRPLN+ALVDEVDSVLIDEARTPLIVSG VSSE NQLY RADM 
Sbjct: 181 GFDYLRDNMVTOQEDMVQRPLNFALVDEVDSVLIDEARTPIjIVSGAVSSETNQLYIRADM 240 

45 

Query: 251 FVKTIJ^SDDYIIDVPTKTIGLSDTGIDKAENYFHLNNLYDLENVALTHYIDNALRANYIM 310 

FVKTIi S DY+IDVPTKTIGLSD+GIDKAE+YF+L+NLYD+ENVALTH+IDNALRANYIM 
Sbjct: 241 FVKTLTSVDYVIDVPTKTIGLSDSGIDKAESYFNLSNLYDIENVALTHFIDNALRANYIM 300 

50 Query: 311 LLNIDYWSEEQEILIVDQFTGRTMEGRRFSDGLHQAIEAKESVPIQEESKTSASITYQN 370 

LL+IDYWSE+ EILIVDQFTGRTMEGRRFSDGLHQAIEAKE V IQEESKTSASITYQN 
Sbjct: 301 LLDIDYWSEDGEILIVDQFTGRTMEGRRFSDGLHQAIEAKEGVRIQEESKTSASITYQN 360 

Query: 371 MFRMYHIOiAGMTGTGKTEEEEFREIYNMRVIPIPTNRPVQRIDHSDLLYPTLDSKFRAVV 430 
55 MFRMY KLAGMTGT KTEEEEFRE+YNMR+IPIPTNRP+ RIDH+DLLYPTIi+SKFRAW 

Sbjct: 361 MFRMYKKLAGMTGTAKTEEEEFREVYNMRIIPIPTNRPIARIDHTDLLYPTLESKFRAVV 420 

Query: 431 AD VKERYEQGQPVLVGTVAVETSDLI SRKLVAAGVPHEVLNAKNHFKEAQI IMNAGQRGA 490 
DVK R+ +GQP+LVGTVAVETSDLISRKLV AG+PHE VIjNAKNHFKEAQI IMNAGQRGA 
60 Sbjct: 421 EDVKTRHAKGQPILVGTOAVETSDLISRKLvEAGIPHEVLNAKNHFKEAQIIMNAGQRGA 480 

Query: 491 VTIATNMAGRGTDIKLGEGVRELGGLCVIGTERHESRRIDNQLRGRSGRQGDPGESQFYL 550 

VTIAlNMAGRGTDIKLGEGVRELGGLCVIGTERHESRRIDNQLRGRSGRQGDPGESQFYIj 
Sbjct: 481 VTIATNMAGRGTDIKLGEGVRELGGLCVIGTERHESRRIDNQLRGRSGRQGDPGESQFYL 540 
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Query: 551 SLEDDLMRRFGTDRIKOTLERMNLAEDDTVIKSKMLTRQVESAQRRVEGNNYDTRKQVLQ 610 

SLEDDLMRRFG+DRIK L+RM L E+DTVIKS ML RQVESAQ+RVEGNNYDTRKQVLQ 
Sbjct: 541 SLEDDLMRRFGSDRIKAFLDRMKLDEEDTVIKSGMLGRQVESAQKRVEGNNYDTRKQVLQ 600 

Query: 611 YDDVMREQREIIYANRREVITAERDLGPELKGMIKRTIKRAVDAHSRSDKNTAAEAIVNF 670 

YDDVMREQREIIYANRR+VITA RDLGPE+K MIKRTI RAVDAH+RS++ A +AIV F 
Sbjct: 601 YDDVMREQREIIYANRRDVITANRDLGPEIKAMIKRTIDRAVDAHARSNRKDAIDAIVTF 660 

Query: 671 ARSALLDEEAIWSELRGLKEAEIKELLYERAI^VYEQQIAKLKDPEAIIEFQKVLILMV 730 

AR++L+ EE 1+ ELRGLK+ +IKE LY+RALA+Y+QQ++KL+D EAIIEFQKVLILM+ 
Sbjct: 661 ARTSLVPEEFISAKELRGLKDDQIKEKLYQRALAIYDQQLSKLRDQEAIIEFQKVLILMI 720 

Query: 731 VDNQWTEHIDALDQLRNSVGLRGYAQNNPIVEYQSEGFRMFQDMIGSIEFDVTRTLMKAQ 790 

VDN+WTEHIDALDQLRN+VGLRGYAQNNP+VEYQ+EGF+MFQDMIG+IEFDVTRT+MKAQ 
Sbjct: 721 VDNKWTEHIDALDQLRNAVGLRGYAQNNPVVEYQAEGFKMFQDMIGAIEFDVTRTMMKAQ 780 

Query: 791 IHEQERERASQHATTTAEQNISAQHVPMNNESPEYQGIKRNDKCPCGSGMKFKNCHG 847 

IHEQERERASQ ATT A QNI +Q ++ P+ ++RN+ CPCGSG KFKNCHG 
Sbjct: 781 IHEQERERASQRATTAAPQNIQSQQSANTDDLPK VERNEACPCGSGKKFKNCHG 834 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1166 

A DNA sequence (GBSxl242) was identified in S.agalactiae <SEQ ID 3621> which encodes the amino 
acid sequence <SEQ ID 3622>. This protein is predicted to be phospho-2-dehydro-3-deoxyheptonate 
aldolase (aroH). Analysis of this protein sequence reveals the following: 

Possible site: 31 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 3429 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF40753 GB:AE002387 phospho-2-dehydro-3-deoxyheptonate 

aldolase, phe-sensitive [Neisseria meningitidis MC58] 
Identities = 122/348 (35%) , Positives = 187/348 (53%) , Gaps = 32/348 (9%) 

Query: 1 MGFHQLSDKINIEILKQKTSLDLEVSQKKLAKE EELKNIIKGEDQRFLVIV 51 

M H +D I 1+ +K+ + + ++KE +E+ +++ G D+R LVI+ 

Sbjct: 1 MTHHYPTDDIKIKEVKELLPPIAHLYELPISKEASGLVHRTRQEISDLVHGRDKRLLVII 60 

Query: 52 GPCSADNPKAVIiTYAKRLAKLEAAFKDKMFLVMRvYTAKPRTNGDGYKGLVHHSDKLGVF 111 

GPCS +PKA L YA+RL KL +++++ +VMRVY KPRT G+KGL++ G F 

Sbjct: 61 GPCSIHDPKAALEYAERLLKLRKQYENELljIvMRvYFEKPRTT-VGVilKGLINDPHLDGTF 119 

Query: 112 FQARKMHYDIIRETGLLTADELLYPEMLSVMDDLVSYYAIGARSVEDQGHRFIS 165 

QAR + + G+ + E L DL+S+ AIGAR+ E Q HR ++ 

Sbjct: 120 DINFGLRQARSLLLS-1MNMGMPASTEFLDMITPQYYADLISWGAIGARTTESQVHRELA 178 

Query: 166 SGIDAPVGMKNPTSGNLRVMFNAVYAAQNQQELFYQNKQ VRTDGNLLSHVI LRGY 220 

SG+ PVG KN T GNL++ +A+ AA + K V T GN HVILRG 

Sbjct: 179 SGLSCPVGFKNGTDGNLKIAIDAIGAASHSHHFLSVTKAGHSAIVHTGGNPDCHVILRGG 238 

Query: 221 HNADYRSIPNYHYENLLETITHYEETDI^NPFIvVDTNHDNSGKQFLEQIRIvKSvIjADR 280 

PNY E++ E + + +++D +H NS K + Q+ + + + A 

Sbjct: 239 KE PNYDAEHVSEAAEQLRAAGVTDK-LMIDCSHANSRKDYTRQMEVAQDIAAQL 291 



Query: 



281 QWHTKIRNYVRGFLIESYLEDGRQDKPDVFGKSITDPCLGWDKTEMLl 328 
+ + + G ++ES+L +GRQDKP+V+GKSITD C+GW TE L+ 
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Sbjct: 292 E QDGGNIMGVIWESHLvEGRQDKPEVYGKSITDACIGWGATEEIjL 336 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3623> which encodes the amino acid 
sequence <SEQ ID 3624>. Analysis of this protein sequence reveals the following: 

5 Possible site: 57 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1171 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 52/233 (22%) , Positives = 93/233 (39%) , Gaps = 40/233 (17%) 

15 

Query: 50 1VGPCSADNPKAVLTYAKRIAKLEAAFKDKMFLVMRVYTAKPRTNGDGYKGLVHHSDKLG 109 

IVGPCS ++ + A KL + R KPRT+ ++GL 
Sbjct: 19 IVGPCSIESYDHIRLAASSAKKLGYNY FRGGAYKPRTSAASFQGLG 64 

20 Query: 110 VFFQARKMHYDIIRETGLLTADELLYPEMLSVMDDLVSYYA1GARSVEDQGHRF1SSGID 169 

Q + +++ +E GLL+ E++ L D + +GAR++++ S ID 

Sbjct: 65 --LQGIRYLHEVCQEFGLLSVSEIMSERQLEEAYDYLDVIQVGARNMQNFEFLKTLSHID 122 

Query: 170 APVGMKNPTSGNLRVMFNAVYAAQNQQELFYQNKQVRTDGNLLSHVIL--RGYHNADYRS 227 
25 P+ K + A+ Q+ + S++IL RG D 

Sbjct: 123 KPILFKRGLMATIEEYLGALSYLQDTGK SNIILCERGVRGYD 164 

Query: 228 I PNYHYENLLETITHYEETDLQNPFI VVDTNHDNSGKQ - FLEQIRIVKSVLAD 279 
+ + +++ ++TDL I+VD +H + L +1 K+V A+ 
30 Sbjct: 165 VETRNMLDIMAVPI IQQKTDLP 1 IVDVSHSTGRRDLLLPAAKIAKAVGAN 214 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1167 

35 A DNA sequence (GBSxl243) was identified in S.agalactiae <SEQ ID 3625> which encodes the amino 
acid sequence <SEQ ID 3626>. This protein is predicted to be AcpS (acpS). Analysis of this protein 
sequence reveals the following: 

Possible site: 59 

>>> Seems to have no N-terminal signal sequence 

40 

Final Results 

bacterial cytoplasm Certainty=0 . 3620 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 

The protein has homology with the following sequences in the GENPEPT database. 

?GP:AAG22706 GB:AF276617 acyl carrier protein synthase; AcpS 
[Streptococcus pneumoniae] 
Identities = 61/117 (52%) , Positives = 90/117 (76%) , Gaps = 1/117 (0%) 

50 

Query: 1 MIVGHGIDLQEIEAITKAYERNQRFAERVLTEQELLLFKGISNPKRQMSFLTGRWAAKEA 60 

MIVGHGID++E+ +1 A R++ FA+RVLT QE+ F + +RQ+ +L GRW+AKEA 
Sbjct: 1 MIVGHGIDIEELASIESAVTRHEGFAKRVLTAQEMERFTSLKG-RRQIEYLAGRWSAKEA 59 

55 Query: 61 YSKALGTGIGKVNFHDIEILSDDKGAPLITKEPFNGKSFVSISHSGNYAQASVILEE 117 

+SKA+GTGI K+ F D+E+L++++GAP ++ PF+GK ++SISH+ + ASVILEE 
Sbjct: 60 FSKAMGTGISKLGFQDLEVIiNNERGAPYFSQAPFSGKIWLSISHTDQFVTASVILEE 116 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 3627> which encodes the amino acid 
sequence <SEQ ID 3628>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2001 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 76/119 (63%) , Positives = 99/119 (82%) , Gaps = 1/119 (0%) 

Query: 1 MIVGHGIDLQEIEAITKAYERNQRFAERVLTEQELLLFKGISNPKRQMSFLTGRWAAKEA 60 
15 MIVGHGIDLQEI AI K Y+RN RFA+++LTEQEL +F+ KR++++L GRW+ KEA 

Sbjct: 1 MIVGHGIDLQEISAIEKVYQRNPRFAQKILTEQELAIFESFPY-KRRLNYIAGRWSGKEA 59 

Query: 61 YSKALGTGIGKVNFHD1EILSDDKGAPL1TKEPFNGKSFVSISHSGNYAQASVILEEEK 119 
++KA+GTGIG++ F DIEIL+D +G P++TK PF G SF+SISHSGNY QASVILE++K 
20 Sbjct: 60 FAKAIGTGIGRLTFQDIEILNDVRGCPILTKSPFKGNSFISISHSGNYVQASV1LEDKK 118 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1168 

25 A DNA sequence (GBSxl244) was identified in S.agalactiae <SEQ ID 3629> which encodes the amino 
acid sequence <SEQ ID 3630>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.24 Transmembrane 78 - 94 ( 77 - 97) 

30 



Final Results 

bacterial membrane — Certainty=0. 2296 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD51027 GB-.AF171873 alanine racemase [Streptococcus pneumoniae] 
Identities = 227/366 (62%) , Positives = 270/366 (73%) 

40 Query: 1 MISSYHRPTRALIDLEAIANNVKSVQEHIPSDKKTFAWKANAYGHGAVEVSKYIESIVD 60 

M +S HRPT+ALI L AI N++ + HIP AWKANAYGHGAV V+K 1+ VD 

Sbjct: 1 MKASPHRPTKALIHLGAIRQNIQQMGAHIPQGTLKLAWKANAYGHGAVAVAKAIQDDVD 60 

Query: 61 GFCTSNLDFAIELRQAGIVKMILVLGvvMPEQVILAKNFJ^ITLTVASLEWLRLCQTSAVD 120 
45 GFCVSN+DEAIELRQAG+ K IL+LGV E V LAK + TLTVA LEW++ VD 

Sbjct: 61 GFCVSNIDEAIELRQAGLSKPILILGVSEIEAVALAKEYDFTLTVAGLEWIQALLDKEVD 120 

Query: 121 LSGLEVHIKVDSGMGRIGVRQLDEGNKLISELGESGASVKGIFTHFATADEADNCKFNQQ 180 
L+GL VH+K+DSGMGRIG R+ E + L + G V+GIFTHFATADE + FN Q 
50 Sbjct: 121 LTGLTVHLKIDSGMGRIGFREASEVEQAQDLLQQHGVCVEGIFTHFATADEESDDYFNAQ 180 

Query: 181 LTFFKDFISGLDNCPDLVHASNSATSLWHSETIFNAVRLGVVMYGLNPSGTDLDBPYPIN 240 

L FK ++ + P+LVHASNSAT+LWH ETIFNAVR+G MYGLNPSG LDLPY + 
Sbjct: 181 LERFKTIIASMKEvPELvHASNSATTLWHVETIFNAVRMGDAIWGLNPSGAVLDLPYDLI 240 



Query: 241 PALSLESELVHVKQLHDGSQVGYGATYQVTGDEFVGTVPIGYADGWTRDMQGFSVIVNGE 300 

PAL+LES LVHVK + G+ +GYGATYQ ++ + TVPIGYADGWTRDMQ FSV+V+G+ 
Sbjct: 241 PALTLESALVHVKWPAGACMGYGATYQADSEQVIATVPIGYADGWTRDMQNFSVLVDGQ 300 
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Query: 301 LCEIIGRVSMDQMTIRLPQKYTIGTKVTLIGQQGSCNITTTDVAQKRQTINYEVLCLLSD 360 

C I+GRVSMDQ+TIRLP+ Y +GTKVTLIG G IT T VA R TINYEV+CLLSD 
Sbjct: 301 ACPIVGRVSMDQITIRLPKLYPLGTKVTLIGSNGDKEITATQVATyRVTINYEWCLLSD 360 

5 Query: 361 RIPRYY 366 

RIPR Y 
Sbjct: 361 RIPREY 366 

A related DNA sequence was identified in S.pyogenes <SEQ ID 363 1> which encodes the amino acid 
10 sequence <SEQ ID 3632>. Analysis of this protein sequence reveals the following: 

Possible site: 41 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.34 Transmembrane 82 - 98 ( 82 - 98) 

15 Final Results 

bacterial membrane Certainty=0 . 1935 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



20 The protein has homology with the following sequences in the databases: 

>GP:AAD51027 GB:AF171873 alanine racemase [Streptococcus pneumoniae] 
Identities = 222/366 (60%) , Positives = 273/366 (73%) 





Query: 


1 


MISSFHRPWARWLQAIKEmffiSVQKHIPLGVKTYAVVKADAYGHGAVQVSKALLPQVD 


60 


25 






M +S HRPT A ++L AI++N+ + HIP G AWKA+AYGHGAV V+KA+ VD 






Sbjct: 


1 


MKASPHRPTKALIHLGAIRQNIQQMGAHIPQGTLI<LAWKANAYGHGAVAVAKAIQDDVD 


60 




Query: 


61 


GYCTSNLDE&LQLRQAGIDKEILIIjGVI&PNEIjE^ 


120 








G+CVSN+DEA++LRQAG+ K ILILGV + LA T+T+A L+WI ++ + 




30 


Sbjct: 


61 


GFCTSNIDEaiELRQAGLSKPILIIiGVSEIEAVALAKEYDFTLTVAGLEWIQALLDKEVD 


120 




Query: 


121 


CQGLKVHVKVDSGMGRIGLRSSKEVNLLIDSLKELGADVEGIFTHFATADEADDTKFNQQ 


180 








GL VH+K+DSGMGRIG R + EV D L++ G VEGIFTHFATADE D FN Q 




35 


Sb j ct : 


121 


LTGLTVHLKIDSGMGRIGFREASEVEQAQDLLQQHGVCVEGIFTHFATADEESDDYFNAQ 


180 




Query: 


181 


LQFFKKLIAGLEDKPRLVHASNSATSIWHSDTIFNAVRLGIVSYGLNPSGSDLSLPFPLQ 


240 








L+ FK ++A +++ P LVHASNSAT++WH +TIFNAVR+G YGLNPSG+ L LP+ L 






Sb j ct : 


181 


LERFKTILASMKEVPELVHASNSATTLWHVETIFNAVRMGDAMYGLNPSGAVLDLPYDLI 


240 


40 


Query: 


241 


FALSLESSLVHVKMISAGDTVGYGATYTAKJCSEYVGTVPIGYADGWTRNMQGFSVLVDGQ 


300 








AL+LES+LVHVK + AG +GYGATY A + + TVPIGYADGWTR+MQ FSVLVDGQ 






Sb j ct : 


241 


PALTLESALvHvTCTVPAGACMGYGATYQADSEQVIATVPIGYADGWTRDMQNFSVLVDGQ 


300 




Query: 


301 


FCEIIGRVSMDQLTIRLPKRYPLGTKVTLIGSNQQKNISTTDIANYRNTINYEVLCLLSD 


360 


45 






C I+GRVSMDQ+TIRLPK YPLGTKVTLIGSN K 1+ T +A YR TINYEV+CLLSD 






Sb j Ct : 


301 


ACPIVGRVSMDQITIRLPKLYPLGTKVTLIGSNGDKEITATQVATYRVTINYEWCLLSD 


360 




Query: 


361 


RIPRIY 366 










RIPR Y 




50 


Sb j ct : 


361 


RIPREY 366 





An alignment of the GAS and GBS proteins is shown below. 

Identities = 247/366 (67%) , Positives = 295/366 (80%) 



Query: 


1 


MISSYHRPTRALIDLEAIANNVKSVQEHIPSDKKTFAVVKANAYGHGAVEVSKYIESIVD 


60 






MISS+HRPT A ++L+AI NV SVQ+HIP KT+AWKA+AYGHGAV+VSK + VD 




Sbjct: 


1 


MISSFHRPTVARvNLQAIKFJtfVASVQKHIPLGVTCM 


60 


Query: 


61 


GFCVSNLDEAIELRQAGIVKMILVLGVVMPEQVILAKNENITLTVASLEWLRLCQTSAVD 


120 






G+CVSNLDEA++LRQAGI K IL+LGV++P ++ LA IT+T+ASL+W+ L + + 




Sbjct: 


61 


GYCVSNLDEALQLRQAGIDKEILILGVLLPNELEIAVANAITVTIASLDWIAIoARLEKKE 


120 


Query: 


121 


LSGLEVHI KVDSGMGRIGVRQLDEGNKLISELGESGASVKGI FTHFATADEADNCKFNQQ 


180 






GL+VH+KVDSGMGRIG+R E N LI L E GA V+GIFTHFATADEAD+ KFNQQ 
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Sb j ct : 


121 


CQGLKAmVCTDSGMGRIGLRSSKEVmLIDSLKELGMVEGIFTHFATADEADDTKFNQQ 


180 


Query: 


181 


LTFFKDFISGLDNCPDLVHASNSATSLWHSETIFNAVRLGWMYGLNPSGTDLDLPYPIN 


240 






L FFK I+GL++ P LVHASNSATS+WHS+TI FNAVRLG+V YGLNPSG+DL LP+P+ 




Sbjct: 


181 


LQFFKKLIAGLEDKPRLVHASNSATSIWHSDTIFNAVRLGIVSYGLNPSGSDLSLPFPLQ 


240 


Query: 


241 


PALSLESELVHVKQLHDGSQVGYGATYQVTGDEFVGTVPIGYADGWTRDMQGFSVIVNGE 


300 






ALSLES LVHVK + G VGYGATY E+VGTVPIGYADGWTR+MQGFSV+V+G+ 




Sbjct: 


241 


EALSLESSLVWmiSAGDWGYGATYTAKKSEWGTVPIGYADGWTRNMQGFSVLVDGQ 


300 


Query: 


301 


LCEIIGRVSMDQMTIRLPQKYTIGTKVTLIGQQGSCNITTTDVAQKRQTINYEVLCLLSD 


360 






CEI IGRVSMDQ+TIRLP+ Y +GTKOTLIG NI+TTD+A R TINYEVLCLLSD 




Sbjct: 


301 


FCEI IGRVSMDQLTIRLPKAYPLGTKVTLIGSNQQKNI STTDIANYRNTINYEVLCLLSD 


360 


Query: 


361 


RIPRYY 366 








RIPR Y 




Sb j ct : 


361 


RIPRIY 366 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
20 vaccines or diagnostics. 

Example 1169 

A DNA sequence (GBSxl245) was identified in S.agalactiae <SEQ ID 3633> which encodes the amino 
acid sequence <SEQ ID 3634>. This protein is predicted to be immunogenic secreted protein precursor. 
Analysis of this protein sequence reveals the following: 

25 Possible site: 27 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

30 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

There is also homology to SEQ ID 1988. 

A related GBS gene <SEQ ID 8745> and protein <SEQ ID 8746> were also identified. Analysis of this 
35 protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 4 
McG: Discrim Score: 8.81 
GvH: Signal Score (-7.5): 0.659999 
Possible site: 27 
40 >>> Seems to have a cleavable N-term signal seq. 

ALOM program count: 0 value: 1.06 threshold: 0.0 
PERIPHERAL Likelihood = 1.06 247 
modified ALOM score: -0.71 

45 *** Reasoning Step: 3 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0. 0000 (Not Clear) 

SEQ ID 8746 (GBS98) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 7 (lane 5; MW 80kDa). 

GBS98-His was purified as shown in Figure 192, lane 9. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1170 

A DNA sequence (GBSxl246) was identified in S.agalactiae <SEQ ID 3635> which encodes the amino 
acid sequence <SEQ ID 3636>. This protein is predicted to be junction specific DNA helicase (mmsA) 
(recG). Analysis of this protein sequence reveals the following: 

Possible site: 17 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.16 Transmembrane 530 - 546 ( 530 - 546) 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA90280 GB:Z49988 MmsA [Streptococcus pneumoniae] 
Identities = 483/671 (71%) , Positives = 568/671 (83%) 

Query: 1 MLLQSPISNLKGFGPKSAEKFQKLDIYTVEDLLLYYPFRYEDFKSKSVFDLVDGEKAVIT 60 

ML P+ L G GPKSAEK+ KL I ++DLLLY+PFRYEDFK+K V +L DGEKAV++ 
Sbjct: 1 MNLHQPLHVLPGVGPKSAEKYAKLGIENLQDLLLYFPFRYEDFKTKQVLELEDGEKAVLS 60 

Query: 61 GLVVTPANVQYYGFKRNRLSFKLRQGEAVLNVSFFNQPYLADKIELGQEVAVFGKWDATK 120 

G WTPA+VQYYGFKRNRL F L+QGE V V+FFNQPYLADKIELG +AVFGKWD K 
Sbjct: 61 GQWTPASVQYYGFKRNRLRFSLKQGEWFAVNFFNQPYIJU)KIEIX^TIjAWGKWDRAK 120 

Query: 121 SAITGMKvIAQVEDDMQPVYRVAQGISQSTLIKAIKSAFEISAHLELKENLPATLLEKYR 180 

+++TGMKVLAQVEDD+QPVYR+AQGISQ++L+K IK+AF+ L ++ENLP +LL+KY+ 
Sbjct: 121 ASLTGMKVLAQVEDDLQPVYRLAQGISQASLVKVIKTAFDQGLDLLIEENLPQSLLDKYK 180 

Query: 181 IMGRSQACLAMHFPKDITEYKQALRRIKFEELFYFQMNLQVLKSENKSETNGLPILYSKH 240 

LM R OA AMHFPKD+ EYKQALRRIKF ELFYFQM LQ LKSEN+ + +GL + +S+ 
Sbjct: 181 LMSRCQAvRAMHFPKDLAEYKQALRRIKFAELFYFQMQLQTLKSENRVQGSGLVLWWSQE 240 

Query: 241 AMETKISSLPFILTNAQKRSLDEILSDMSSGAHMNRLLQGDVGSGKTVIAGLSMYAAYTA 300 

+ +SLPF LT AQ++SL EIL+DM S HMNRLLQGDVGSGKTV+AGL+M+AA TA 
Sbjct: 241 KVTAVKASLPFALTQAQEKSLQEILTDMKSDHHMNRLLQGDVGSGKTWAGLAMFAAVTA 300 

Query: 301 GFQSALMVPTEILAEQHYISLQELFPDLSIAILTSGMKAAVKRTVLAAIANGSVDMIVGT 360 

G+Q+ALMVPTE I LAEQH+ SLQ LFP+L +A+LT +KAA KR VL IA G D+I+GT 
Sbjct: 301 GYQAALMVPTEILAEQHFESLQNLFPNLKLALLTGSLKAAEKREVLETIAKGEADLIIGT 360 

Query: 361 HALIQDSVQYHKLGLVITDEQHRFGVKQRRIFREKGENPDVLMMTATPIPRTLAITAFGE 420 

HALIQD V+Y +LGL+I DEQHRFGV QRRI REKG+NPDVLMMTATPIPRTLAITAFG+ 
Sbjct: 361 HALIQDGVEYARLGLIIIDEQHRFGVGQRRILREKGDNPDVLMMTATPIPRTLAITAFGD 420 

Query: 421 MDVSIIDELPAGRKPIITRWKHEQLGTVLEWVKGELQKDAQVYVISPLIEESEALDLKN 480 

MDVS I ID++PAGRKPI +TRW+KHEQL VL W++GE+QK +Q YVISPLIEESEALDLKN 
Sbjct: 421 MDVSIIDQMPAGRKPIVTRWIKHEQLPQVLTWLEGEIQKGSQAYVISPLIEESEALDLKN 480 

Query: 481 AVALHAELSTYFEGIAKVALVHGRMKNDEKDAIMQDFKDKKSHILVSTTVIEVGVNVPNA 540 

A+AL EL+T+F G A+VAL+HGRMK+DEKD IMQDFK++K+ ILVSTTVIEVGVNVPNA 
Sbjct: 481 AIALSEELTTHFAGKAEVALLHGRMKSDEKDQIMQDFKERKTDILVSTTVIEVGVKVPNA 540 

Query: 541 TIMIIMDADRFGLSQLHQLRGRVGRGYKQSYAVLVANPKTDSGKKRMTIMTETTDGFVLA 600 

T+MI IMDADRFGLSQLHQLRGRVGRG KQSYAVLVANPKTDSGK RM IMTETT+GFVLA 
Sbjct: 541 TVMIIMDADRFGLSQLHQLRGRVGRGDKQSYAVLVANPKTDSGKDRMRIMTETTNGFVLA 600 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0 . 1065 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 



Query: 601 ESDLKMRGSGEIFGTRQSGIPEFQVADIVEDYPILEEARRVASDIVKDNNWKENTEWALI 660 

E DLKMRGSGE I FGTRQSG+ PEFQVADI +ED+PILEEAR+VAS I W+E+ EW +1 

Sbjct: 601 EEDLKMRGSGEIFGTRQSGLPEFQVADIIEDFPILEEARKVASYISSIEAWQEDPEWRMI 660 
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Query: 661 LDNLRQHSDFD 671 

+L + D 
Sbjct: 661 ALHLEKKEHLD 671 

5 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3637> which encodes the amino acid 
sequence <SEQ ID 3638>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

>» Seems to have no N-terminal signal sequence 
10 INTEGRAL Likelihood = -0.16 Transmembrane 530 - 546 ( 530 - 546) 

Final Results 

bacterial membrane Certainty=0 . 1065 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < succ> 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 641/671 (95%), Positives = 655/671 (97%) 

20 Query: 1 MLLQSPISNLKGFGPKSAEKFQKLDIYTVEDLLLYYPFRYEDFKSKSVFDLVDGEKAVIT 60 

M+L +P+SNLKGFGPKSAEKFQKLDIYTVEDLLLYYPFRYEDFKSKSVFDLVDGEKAVIT 
Sbjct: 1 MILTAPMSNLKGFGPKSAEKFQKLDIYTVEDLLLYYPFRYEDFKSKSVFDLVDGEKAVIT 60 

Query: 61 GLVVTPANVQYYGFKRNRLSFKLRQGFAVLJWSFFNQPYLADKIELGQEVAVFGKWDATK 120 
25 GLVWPANVQYYGFKRmLSFKLRQGFAVlNVSFFNQPYLADKIELGQEVAVFGKWDATK 

Sbjct: 61 GLVVTPANVQYYGFKRNRLSFKLRQGFAVIWSFFNQPY1ADKIELGQEVAVFGKWDATK 120 

Query: 121 SAITGMKVIAQVEDDMQPVYRVAQGISQSTLIKAIKSAFEISAHLELKENLPATLLEKYR 180 
SAITGMKVLAQVEDDMQPVYRVAQGISQSTLIKAIKSAFEI AHLELKENLPATLLEKYR 
30 Sbjct: 121 SAITGMKVIAQVEDDMQPVYRVACjGISQSTLIKAIKSAFEinAHLELKENLPATLLEKYR 180 

Query: 181 LMGRSQACLAMHFPKDITEYKQALRRIKFEELFYFQMNLQVLKSENKSETNGLPILYSKH 240 

LMGRSQACLAMHFPKDITEYKQALRRIKFEELFYFQMNLQVLK+ENKSETNGLPILYSK 
Sbjct: 181 LMGRSQACLAMHFPKDITEYKQALRRIKFEBLFYFQMNLQVLKAENKSETNGLPILYSKR 240 

35 

Query: 241 AMETKISSLPFILTNAQKRSLDEILSDMSSGAH^^NRLLCGDVGSGKTVIAGLSMYAAYTA 300 

AMETKISSLPFILTNAQKRSLD+ILSDMSSGAHMNRLLQGDVGSGKTVIAGLSMYAAYTA 
Sbjct: 241 AMETKI S SLPF ILTNAQKRSLDD I LSDMSSGAHMNRLLQGDVGSGKTVI AGLSMYAAYTA 300 

40 Query: 301 GFQSALWPTEILAEQHYISLQELFPDLSIAILTSGMKAAVKRTVLAAIANGSVDMIVGT 360 

GFQSALMVPTEIIiAEQHYISLQELFPDLSIAILTSGMKAAVKRTVLAAIANGSVDMIVGT 
Sbjct: 301 GFQSAL^W•PTEILAEQHYISLQELFPDLSIAILTSGMKAAVKRTVLAAIANGSVDMIVGT 360 

Query: 361 HALIQDSVQYHKLGLVITDEQHRFGVKQRRIFREKGENPDVLMMTATPIPRTLAITAFGE 420 
45 HAL I QDSVQYHKLGLVI TDEQHRFGVKQRRI FREKGENPDVLMMTATPIPRTLAITAFGE 

Sbjct: 361 HALIQDSVQYHKLGLVITDEQHRFGVKQRRIFREKGENPDVLMMTATPIPRTLAITAFGE 420 

Query: 421 MDVSIIDELPAGRKPIITRWKHEQLGTVLEWVKGELQKDAQVYVISPLIEESEALDLKN 480 
MDVSIIDELPAGRKPI+TRWKHEQLGTVLEVfVKGELQKDAQvYVISPLIEESEALDLKN 
50 Sbjct: 421 MDVSIIDELPAGRKPIMTRWVKHEQLGTVLEWVKGELQKDAQVYVISPLIEESEALDLKN 480 

Query: 481 AVALHAELSTYFEGIAKVALVHGRMKNDEKTJIAIMQDFKDKKSHILVSTTVIEVGVNVPNA 540 

AVALHAELSTYFEGIAKVALVHGRMKNDEKDAIMQDFKDKKSHILVSTTVIEVGVNVPNA 
Sbjct: 481 AVALHAELSTYFEGIAKVALVHGRMKNDEKDAIMQDFKDKKSHILVSTTVIEVGVNVPNA 540 

55 

Query: 541 TIMIIMDADRFGLSQLHQLRGRVGRGYKQSYAVLVANPKTDSGKKRMTIMTETTDGFVLA 600 

TIMIIMDADRFGLSQLHQLRGRVGRGYKQSYAVLVANPKTDSGKKRMTIMTETTDGFVLA 
Sbjct: 541 TIMIIMDADRFGLSQLHQLRGRVGRGYKQSYAVLVANPKTDSGKKRMTIMTETTDGFVLA 600 

60 Query: 601 ESDLKMRGSGEIFGTRQSGIPEFQVADIVEDYPILEEARRVASDIVKDNNWKENTEWALI 660 

ESDLKMRGSGEIFGTRQSGIPEFQVADIVEDYPILEEAR+V++ IV D NW +W L+ 
Sbjct: 601 ESDLKMRGSGEIFGTRQSGIPEFQVADIVEDYPILEEARKVSAAIVSDPNWIYEKQWQLV 660 

Query: 661 LDNLRQHSDFD 671 
65 N+R+ +D 
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Sbjct: 661 AQNIRKKEVYD 671 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

5 Example 1171 

A DNA sequence (GBSxl247) was identified in S.agalactiae <SEQ ID 3639> which encodes the amino 
acid sequence <SEQ ID 3640>. This protein is predicted to be aryl-alcohol dehydrogenase (M647). 
Analysis of this protein sequence reveals the following: 

Possible site: 50 
10 >» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1562 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10105> which encodes amino acid sequence <SEQ ID 
101 06> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

20 >GP:BAB07646 GB:AP001520 aryl-alcohol dehydrogenase [Bacillus halodurans] 

Identities = 173/300 (57%) , Positives = 224/300 (74%) 



Query: 


7 


IGQTGIQJVTRIALGCMRMSDLKBKQAEEWGTALDLGINFFDHADIYGGGLSELRFRDAI 


66 






+G + ++ +A+GCMR++ + K+AE V TAL+ G NFFDHADIYGGG E F DAI 




Sb j ct : 


6 


LGSSSLEVPWAVGCMRINAISKKEAERFVQTALEQGANFFDHADIYGGGECEEIFADAI 


65 


Query: 


67 


KHtNVNRDKMIIQSKCGIREGYFDFSKEYILSSVDGILERLGTEYLDPLILHRPDVLVEP 


126 






+ R+K+ I +QSKCGIREG FDFSKEYIL SVDGIL+RL T+YLD L+LHRPD LVEP 




Sb j ct : 


66 


QMNEAVREKIILQSKCGIREGRFDFSKEYILQSVDGILQRLKTDYLDVLLLHRPDALVEP 


125 


Query: 


127 


EEVAEAFTKLRAEGKVKHFGVSNQNRFQMELLQSYLDEPLAVNQLQLSPAHTPMFDAGLN 


186 






EEVAEAF L + GKV+HFGVSNQN Q+ELL+ ++ +P+ NQLQLS + M +G+N 




Sb j ct : 


126 


EEVAEAFDLLESSGKVRHFGVSNQNPMQIELLKKFVRQPIVANQLQLSITNATMISSGIN 


185 


Query: 


187 


vTmi^KASIEHDDGIVDYCRLKRVTIQAWSPFQIDLSRGLFVNHPDYKELNETIAKLAKN 


246 






VNM N+++I D ++DYCRL VTIQ WSPFQ G+F+ + + EIiN+ I +LA+ 




Sb j ct : 


186 


VNMENESAINRDGSVLDYCRLHDVTIQPWSPFQYGFFEGVFLGNDLFPELNKKIDELAEK 


245 


Query: 


247 


YNVSSEAIVIAWILRHPAKMQAIVGSMNPSRLKAIDKANDIALTRKEWYDIYRSAGNILP 


306 






Y VS+ I IAW+LRHPA MQ ++G+MN RLK KA++I LTR+EWY+IYR+AGNILP 




Sb j ct : 


246 


YEVSNTTIAIAWLLRHPANMQPVIGTMNLKRLKDCCKASEIRLTREEWYEIYRAAGNILP 


305 



There is also homology to SEQ ID 780. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
45 vaccines or diagnostics. 

Example 1172 

A DNA sequence (GBSxl248) was identified in S.agalactiae <SEQ ID 3641> which encodes the amino 
acid sequence <SEQ ID 3642>. This protein is predicted to be shikimate 5-dehydrogenase (aroE) (aroE). 
Analysis of this protein sequence reveals the following: 

50 Possible site: 21 

»> Seems to have no N-terminal signal sequence 

Final Results 
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bacterial cytoplasm Certainty=0 . 0988 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

5 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC74762 GB:AE000264 putative oxidoreductase [Escherichia coli K12] 
Identities = 114/279 (40%), Positives = 171/279 (60%), Gaps = 3/279 (1%) 

Query: 10 LTGLIANPARHSLSPLMVMSFQEKNMNYAYLTFEVEEGKLTEaWGVT^GIRGVWSM 69 
10 L GL+A P RHSLSP M N + ++ + + Y+ FEV+ A+ G++AL +RG VSM 

Sbjct: 9 LIGLMAYPIRHSLSPEMQNKALEKAGLPFTYMAFEVDNDSFPGAIEGLKALKMRGTGVSM 68 

Query: 70 PFKQSVIPLLDDLSPQAKLVGAVNTIVNQGGTGRLVGHMTDGIGCFKALAAQGFSAKNKI 129 
P KQ +D+L+P AKLVGA+NTIVN G R G+ TDG G +A+ GF K K 

15 Sbjct: 69 PNKQIACEYVDELTPAAKLVGAINTIVNDDGYLR- -GYNTDGTGHIRAIKESGFDIKGKT 126 

Query: 130 1TIAGIGGSGKAVAVQAAMEGVAEIRLFNRNSSNYDKVIDLSDKIKKQFQIKVVVDYLEN 189 

+ + G GG+ A+ Q A+EG+ EI+LFNR +DK +++++ VVL + 

Sbjct: 127 MVLLGAGGASTAIGAQGAIEGLKEIKLFNRRDEFFDKAIAFAQRVNENTDCvTO/TDLM 186 

20 

Query: 190 KTAFKDAIRTSHFYIDATSLGMRPLDNYSLINDPEILTPNLVWDLVYKPKETALLRFVR 249 

+ AF +A+ ++ + T +GM+PL+N SL+ND +L P L+V + VY P T LL+ + 
Sbjct: 187 QQAFAFAIASADILTNGTKVGMKPLENESLVNDISLLHPGLLVTECVYNPHMTKLLQQAQ 246 

25 Query: 250 QNGVKHAYNGLGMLIYQGAEAFQLITNQEMPISSVERVL 288 

Q G K +G GML++QGAE F L T ++ P+ V++V+ 
Sbjct: 247 QAGCK-TIDGYGMLLWQGAEQFTLWTGKDFPLEYVKQVM 284 

A related DNA sequence was identified in S. pyogenes <SEQ ID 3643> which encodes the amino acid 

30 sequence <SEQ ID 3644>. Analysis of this protein sequence reveals the following: 

Possible site: 54 
»> Seems to have an uncleavable N-term signal seg 

Final Results 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

40 >GP:AAC74762 GB:AE000264 putative oxidoreductase [Escherichia coli] 

Identities = 132/280 (47%) , Positives = 186/280 (66%) , Gaps = 3/280 (1%) 

Query: 11 LVSLLATPIRHSLSPKMHNEAYAKLGLDYAYLAFEVGTEQLADAVQGIRALGIRGSNVSM 70 
L+ L+A PIRHSLSP+M N+A K GL + Y+AFEV + A++G++AL +RG+ VSM 
45 Sbjct: 9 LIGLMAYPIRHSLSPEMQNKALEKAGLPFTYMAFEVDNDSFPGAIEGLKALKMRGTGVSM 68 

Query: 71 PNKEAILPLLDDLSPAAELVGAVNTVVNKDGKGHLVGHITDGIGALRALADEGVSVKNKI 130 

PNK+ +D+L+PAA+LVGA+NT+VN DG +h G+ TDG G +RA+ + G +K K 

Sbjct: 69 PNKQIACEYVDELTPAAKLVGAINTIVNDDG--YLRGYNTDGTGHIRAIKESGFDIKGKT 126 

50 

Query: 131 ITIAGVGGAGKAIAVQIAFDGAKEWLFNRQATRLSSVQKLVTKIMQLTRTKVTLQDLED 190 

+ L G GGA AI Q A +G KE++LFNR+ ++N+ T VT+ DL D 

Sbjct: 127 MVLLGAGGASTAIGAQGAIEGLKEIKLFNRRDEFBT)KALAFAQRVNENTDCTA7TVTDLAD 186 

55 Query: 191 QTAFKEAIRESHLFIDATSVGMKPLENLSLITDPELIRPDLWFDIVYSPAETKLIAFAR 250 

Q AF EA+ + + + T VGMKPLEN SL+ D h+ P L+V + VY+P TKLL A+ 
Sbjct: 187 QQAFAEALASADILTNGTKVGMKPLENESLVNDISLLHPGLLVTEOTYNPHMTKLLQQAQ 246 

Query: 251 QHGAQKVINGLGMVLYQGAEAFKLITGQDMPVDAIKPLLG 290 
60 Q G K I+G GM+L+QGAE F L TG+D P++ +K ++G 

Sbjct: 247 QAGC-KTIDGYGMLLWQGAEQFTLWTGKDFPLEYVKQVMG 285 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 166/288 (57%) , Positives = 221/288 (76%) 

Query: 4 LNGETLLTGLIANPARHSLSPLMWIWSFQEK^ 63 

L+G TLL L+A P RHSLSP M N ++ + ++YAYL FEV +L +AV+G+RALGIR 
Sbjct: 5 LSGHTLLVSLLATPIRHSLSPKMHNEAYAKLGLDYAYLAFEVGTEQLADAVQGIRALGIR 64 

Query: 64 GVWSMPFKQSVIPLLDDLSPQAKLVGAVNTIWQGGTGRLVGHMTDGIGCFKALAAQGF 123 

G NVSMP K++++PLLDDLSP A+LVGAVNT+VN+ G G LVGH+TDGIG +ALA +G 
Sbjct: 65 GSWSMPNKEAILPLLDDLSPAAELVGAVNTVVNK1X3KGHLVGHITDGIGALRALADEGV 124 

Query: 124 SAKNKIITIAGIGGSGKAVAVQAAMEGVAEIRLFNRNSSNYDKVIDLSDKIKKQFQIKVV 183 

S KNKI IT+AG+GG+GKA+AVQ A +G E+RLFNR ++ V L K+ + + KV 
Sbjct: 125 SWNKIlTIAGVGGAGKAIAVQIAFDGAraOTUjFIffiQATRLSSVQKLVTKIjNQLTRTKOT 184 

15 Query: 184 VDYLFJSIKTAFKDAIRTSHFYIDATSLGMRPLDlTOSLINDPEILTPNLvvvIJLVYKPKETA 243 

+ LE++TAFK+AIR SH +IDATS+GM+PL+N SLI DPE++ P+LW D+VY P ET 
Sbjct: 185 LQDLEDQTAFKEAIRESHLFIDATSVGMKPLENLSLITDPELIRPDLWFDIVYSPAETK 244 

Query: 244 LLRFVRQNGVKHAYNGLGMLIYQGAEAFQLITNQEMPISSVERVLQTE 291 
20 LL F RQ+G + NGLGM++YQGAEAF+LIT Q+MP+ +++ +L E 

Sbjct: 245 LLAFARQHGAQKVINGLGMVLYQGAEAFKLITGQDMPVDAIKPLLGDE 292 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

25 Example 1173 

A DNA sequence (GBSxl249) was identified in S.agalactiae <SEQ ID 3645> which encodes the amino 
acid sequence <SEQ ID 3646>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>» Seems to have no N- terminal signal sequence 
30 INTEGRAL Likelihood = -6.16 Transmembrane 57 - 73 ( 53 - 76) 

Final Results 

bacterial membrane Certainty=0 .3463 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
40 vaccines or diagnostics. 

Example 1174 

A DNA sequence (GBSxl250) was identified in S.agalactiae <SEQ ID 3647> which encodes the amino 
acid sequence <SEQ ID 3648>. Analysis of this protein sequence reveals the following: 

Possible site: 17 
45 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty^O. 2333 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 101 03> which encodes amino acid sequence <SEQ ID 
10104> was also identified. 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05343 GB:AP001512 L-asparaginase [Bacillus halodurans] 
Identities = 158/319 (49%) , Positives = 214/319 (66%) , Gaps = 4/319 (1%) 

5 Query: 1 MKKILvIiHTGGTISMNAlTOKGQVMSSADNPMKYvDLSLDDL-DLTVvriFLNLPSPQITPH 59 

MKK+LV+HTGGTI+M+ +EKG V NP+ SL + + V DFLN+PSP +TP 

Sbjct: 1 MKKVLVIHTGGTIAMHEDEKGGVQPKETNPLFATVESLTSIASIEVDDFLNIPSPHMTPE 60 

Query: 60 HMLDIYHYLKQHASW--FDGWITHGTDTLEETAYFLDTMILPKIPIIITGAMRSTNELG 117 
10 M + LK N FDGWITHGTDTLEETAY LD ++ ++P+++TGAMRS+NELG 

Sbjct: 61 LMFQIJffiRLKSRVGlffiSFDGWITHGTDTLEETAYLLDLLLDWEVPVVvTGAMRSSNELG 120 

Query: 118 SDGVYNYLSALRVANSTK&ADKGVLVVMITOEIHAAKYVTKTOT 177 
+DG +N++SA++ A + +A KGVLW NDEIH AK VTKTHT+NV+TFQ+P +GP+GI+ 
15 Sbjct: 121 ADGPHNFISAVKTAATDEAKGKGVLWFNDEIHTAKNVTKTHTSNVATFQSPQYGPIGIV 180 

Query: 178 MKQDLLFFKATEERVRFDLDKITGTVPIVKAYAGMGDSGIISFLNSQNISGLVIEALGAG 237 

K++FA +++I V ++KAYAGM D ++ + I GLVIEA G G 
Sbjct: 181 TKRGVTFHHAPSYKESYTVSSIDHRWLLKAYAGM-DGSWDAIADTGIDGLVIEAFGQG 239 

20 

Query: 238 NMPPKAAQEIEELIEQGVPVVLVSRCFNGIAEPWGYEGGGAKLQESGVMFVKELNAPKA 297 

N+PP 1+ L + +PWLVSR +GI + Y YEGGG L++ GV+F LN KA 

Sbjct: 240 NLPPAWPSIKRLHQANIPWLVSRSVSGIVQETYAYEGGGRHLKDLGVIFTNGLNGQKA 299 

25 Query: 298 RLKLLIALNAGLTGQNLKD 316 

RLKLL+AL + L++ 

Sbjct: 300 RLKLLVALELTTDRKKLQE 318 

A related DNA sequence was identified in S. pyogenes <SEQ ID 3649> which encodes the amino acid 
30 sequence <SEQ ID 3650>. Analysis of this protein sequence reveals the following: 

Possible site: 16 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.28 Transmembrane 245 - 261 ( 243 - 261) 

35 Final Results 

bacterial membrane Certainty=0 . 1914 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the databases: 

>GP:BAB05343 GB:AP001512 L-asparaginase [Bacillus halodurans] 
Identities = 158/320 (49%) , Positives = 218/320 (67%) , Gaps = 5/320 (1%) 

Query: 1 MKKILVLHTGGTISMQADNSGRWPNQDNPM-TKIHAAAQDIQLTVSDFLNLPSPHITPH 59 
45 MKK+LV+HTGGTI+M D G V P + NP+ + + + V DFLN+PSPH+TP 

Sbjct: 1 MKKVLVIHTGGTIAMHEDEKGGVQPKETNPLFATVESLTS IAS IE VDDFLNI PSPHMTPE 60 

Query: 60 HMLSIYHHIQERT--DVFDGIVITHGTDTLEETAYFLDTMALPTNIPWLTGAMRSSNEV 117 
M + ++ R + FDG+VTTHGTDTLEETAY LD + L +PW+TGAMRSSNE+ 
50 Sbjct: 61 LMFQLAERLKSRVGNESFDGWITHGTDTLEETAYLLDLL-LDWEVPVWTGAMRSSNEL 119 

Query: 118 GSDGIYNYLTALRVASSDKAKEKGVLVVMNDEIHflAKYVTKTHTTNISTFQTPTHGPLGI 177 

G+DG +N+++A++ A++D+AK KGVLW NDEIH AK VTKTHT+N++TFQ+P +GP+GI 
Sbjct: 120 GADGPHNFISATOTAATDFAKGKGVLWFNDEIHTAKNVTKTHTSNVATFQSPQYGPIGI 179 

55 

Query: 178 IMKNDLLFFKTAEPRIRFDLRCISGTIPIIKAYAGMGDGSILSLLTPGSIQGLVIEALGA 237 

+K+F + + + I + ++KAYAGM DGS++ + I GLVIEA G 

Sbjct: 180 VTKRGVTFHHAPSYKESYWSSIDHRWLLKAYAGM-DGSVVDAIADTGIDGLVIEAFGQ 238 

60 Query: 238 GNVPPLAVGEIEHLIALGIPVILVSRCFNGMAEPWAYEGGGAMLQFAGVMFVKELNAPK 297 

GN+PP V 1+ L IPV+LVSR +G+ + YAYEGGG L++ GV+F LN K 
Sbjct: 239 GNLPPA WPS I KRLHQANI PWLVSRS VSGI VQETYAYEGGGRHLKDLGVI FTNGLNGQK 298 



Query: 298 ARLKLLIALNAGLTGQELKD 317 
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ARLKLL+AL ++L++ 
Sbjct: 299 ARLKLLVALELTTDRKKLQE 318 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 242/321 (75%) , Positives = 275/321 (85%) , Gaps = 1/321 (0%) 

Query: 1 MKKILVLHTGGTISmANEKGQVMSSADNPMKYVDLSLDDLDLTVVDFrmPSPQITPHH 60 

MKKILVLHTGGTISM A+ G+V+ + DNPM + + D+ LTV DFLNLPSP ITPHH 
Sbjct: 1 MKKILVLHTGGTISMQADNSGRWPNQDNPMTKIHAAAQDIQLTVSDFLNLPSPHITPHH 60 

Query: 61 MLDIYHYLKQHASNFDGWITHGTDTLEETAYFLDTMILP-KIPIIITGAMRSTNELGSD 119 

ML IYH++++ FDG+VITHGTDTLEETAYFLDTM LP IP+++TGAMRS+NE+GSD 
Sbjct: 61 MLSIYHHIQERTDVFDGIVITHGTDTLEETAYFLDTMALPTNIPWLTGAMRSSNEVGSD 120 

15 Query: 120 GVYNYLSALRVANSTKAADKGVLVVMITOEIHAAKYVTKTHTTNVSTFQTPTH 179 

G+YNYL+ALRVA+S KA +KGVLVVMNDEIHAAKYVTKTHTTN+STFQTPTHGPLGIIMK 
Sbjct: 121 GIYNYLTALRVASSDKAKEKGVLVVMNDEIHAAKYVTKTHTTNISTFQTPTHGPLGIIMK 180 

Query: 180 QDLLFFKATEERTOFDLDKITGTVPIVKAYAGMGDSGIISFLNSQNISGLVIEALGAGNM 239 
20 DLLFFK E R+RFDL I+GT+PI+KAYAGMGD I+S L +1 GLVI EALGAGN+ 

Sbjct: 181 NDLLFFKTAEPRIRFDLRCISGTIPIIKAYAGMGDGSILSLLTPGSIQGLVIEALGAGNV 240 

Query: 240 PPKAAQEIEELIEQGVPWLVSRCFNGIAEPVYGYEGGGAKLQESGVMFVKELNAPKARL 299 
PP A EIE LI G+PV+LVSRCFNG+AEPVY YEGGGA LQE+GVMFVKELNAPKARL 
25 Sbjct: 241 PPLAVGEIEHLIALGIPVILVSRCFNGMAEPVYAYEGGGAMLQEAGVMFVKELNAPKARL 300 

Query: 300 KLLIALNAGLTGQNLKDYIEG 320 

KLLIALNAGLTGQ LKDYIEG 
Sbjct: 301 KLLIALNAGLTGQELKDYIEG 321 

30 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1175 

A DNA sequence (GBSxl251) was identified in S.agalactiae <SEQ ID 3651> which encodes the amino 
35 acid sequence <SEQ ID 3652>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

>>> Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0. 4427 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

45 >GP:CAB85142 GB:AL162757 conserved hypothetical protein [Neisseria 

meningitidis Z2491] 
Identities = 87/285 (30%), Positives = 138/285 (47%), Gaps = 35/285 (12%) 

Query: 4 KAVFFDIDGTLLNDRKNVQKSTIK-AIRNLKDQGILVGLATGRG PSFVQPFLENLG 58 

50 K VFFDID TL + + ++K A+ L+ +GIL LATGR P V+ + G 

Sbjct: 11 KIVFFDIDDTLYRKYTDTLRPSVKTAVAALRGKGILTALATGRSLATIPEKVRDMMAETG 70 

Query: 59 LDFAVTYNGQYIYSRSEIIYTNQLSKTTVYRLIRYAGARRREISLGTASGLLGSGIIGLG 118 
+D VT NGQ+ + + + + R+ + SLG +G G+ 

55 Sbjct: 71 MDAWTINGQFALLHGKTVCEVPMDAGLMGRVCAHLD SLGMDYAFVGGE- -GIA 122 

Query: 119 TSRLGQIVSSLVPRKWAKAIERSFKHFIRRIKPQNIDSLMVILREPIYQWLVATEGE-- 176 

S L + V R+ KH I +P+YQ+++ A E E 

Sbjct: 123 VSALSECVC RALKH IASDFFADKDYFSSKPVYQMLVFAEENEMP 166 



60 



Query: 177 --SERIQKQFPRVKLTRSSPYSMDVISEGQSKVKGIERVGQRYGFDLSEVIAFGDSDNDI 234 
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S+ ++++ +K R ++D++ G SK GI V + G ++++V+AFGD ND+ 
Sbjct: 167 LWSDI VERE - -GLKTVRVfflEEAVDLIjPAGASKTIXSIRSVVEALGLEMfiDVMAFGDGIiNDV 224 

Query: 235 EMLSQVGIGVAMGNASQQVRENARYTTADNNDDGISKALAHYGLI 279 
5 EMLS+VG GVAMGN Q +E A+Y ++DG+ + L G+I 

Sbjct: 225 EMLSEVGFGVAMGNGEQAAKEAAKYVCPGVDEDGVLRGLQDLGVI 269 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3653> which encodes the amino acid 
sequence <SEQ ID 3654>. Analysis of this protein sequence reveals the following: 

10 Possible site: 45 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 6014 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 320/459 (69%), Positives = 391/459 (84%) 

20 

MAIKAVFFD1DGTLLNDRKNVQKSTIKAIRNLKDQGILVGLATGRGPSFVQPFLENLGLD 6 0 
+ +KA.VFFDIDGTLLNDRKN+QK+T KAI+ LK QGI+VGIATGRGP FVQPFLEN GLD 
LTVKAVFFDIDGTLLNDRKNIQKTTQKAIQQLKKQGIMVGIATGRGPGFVQPFLENFGLD 6 0 

25 Query: 61 FAVTYNGQY1 YSRSEI I YTNQLSKTTVYRLIRYAGARRREI SLGTASGLLGSGI IGLGTS 120 

FAVTYNGQYI +R +++Y NQL K+ +Y++IRYA ++RE1SLGTASGL GS II +GTS 
FAVTYNGQYILTRDKVLYQNQLPKSMIYKVIRYANEKKREISLGTASGIAGSRIIDMGTS 120 

RLGQIVSSLVPRKWAKAIERSFKHFIRRIKPQNIDSLMVILREPIYQWLVATEGESERI 180 
30 GQ++SS VP+ WA+ +E SFKH IRRIKPQ+ +L+ I+REPIYQWLVA++ E+++I 



35 



40 



45 



Query: 


1 


Sb j ct : 


1 


Query: 


61 


Sbj ct : 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 


Query: 


241 


Sbj ct : 


241 


Query: 


301 


Sbjct: 


301 


Query: 


361 


Sbjct: 


361 


Query: 


421 


Sbjct: 


421 



Q++FP +K+TRSSPYS+D+IS QSK+KGIER+G+ +GFDLSEV+AFGDSDND+EMLS V 



GIG+AMGNA V++ A +TT NN+DGISKALAHYGLI F+IEK+F SRDENFNKVK F 



H LMD +TIETPR Y EAG+RS FKVEEIVEFLYAAS+G+Q+ F Q+I +LH A+D+A 



+KV +K H ETPL+G+VDAL DLLY TYGSFVLMGVDP+P+F+ VHEANM KIFPDGKA 



50 HFDPVTHKI KPD W+E APE +I++ELD Q+QKSL R 

Sbjct: 421 HFDPVTHKIQKPDYWQERHAPEVAIKKELDKQLQKSLQR 459 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

55 Example 1176 

A DNA sequence (GBSxl252) was identified in S.agalactiae <SEQ ID 3655> which encodes the amino 
acid sequence <SEQ ID 3656>. Analysis of this protein sequence reveals the following: 



60 



Possible site: 38 

>>> Seems to have no N-terminal signal sequence 
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Final Results 



bacterial cytoplasm 
bacterial membrane 
bacterial outside 



Certainty=0. 1671 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 10101> which encodes amino acid sequence <SEQ ID 
101 02> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06903 GB:AP001518 unknown conserved protein [Bacillus halodurans] 
Identities = 61/141 (43%) , Positives = 92/141 (64%) 

Query: 22 YERILVAIDGSTESELAFEKAVNVALRNDSELILTHVIDTRALQSFATFDTYIYEKLEKE 81 

Y I LVA+DGST+ + + A KA N A ++L + HVID+R+ + +D + E + 
Sbjct: 2 YNHILVAVDGSTQAKRALYKAFNYAKEFKADLFICHVIDSRSFATVEQYDRTWGAAELD 61 

Query: 82 AKDVLEEYEKQAREKGADKvRQVIEFGNPKTLLAHDIPEKEKVDLI^^VGATGLNTFERFX 141 

K +L+ Y ++A + G DKV +++FG+PK ++ I +K +DLI+ GATGLN ERF 
Sbjct: 62 GKKLLQRYSEEAEKAGVDKVHTILDFGSPKANISKTIAQKYDIDLIITGATGIiNAVERFL 121 

Query: 142 IGSSSEYILRHAKVDLLIVRD 162 

+GS SE + RHAK D+LIVR+ 
Sbjct: 122 MGSVSESVARHAKCDVLIVRN 142 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3657> which encodes the amino acid 
sequence <SEQ ID 3658>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

»> Seems to have no N-terminal signal sequence 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 117/156 (75%) , Positives = 135/156 (86%) 

Query: 12 LEEDRLMSQKYERILVAIDGSTESELAFEKAVNVALRNDSELILTHVIDTRALQSFATFD 71 

L+ED MS KY+RILVAIDGS ESELAF K VNVALRND+ L+L HVIDTRALQS ATFD 
Sbjct: 25 LKEDSSMSLKYKRILVAIDGSYESELAFNKGVNVALRNDATIjLLVHVIDTRALQSVATFD 84 

Query: 72 TYIYEKLEKFAKDVLEEYEKQAREKGADKVRQVIEFGNPKTLLAHDIPEKEKVDLIMVGA 131 

TYIYEKLE+EAKDVL+++EKQA+ G ++Q+IEFGNPK LLAHDIP++E DLIMVGA 
Sbjct: 85 TYI YEKLEQEAKD VLDDFEKQAQIAGITNIKQI IEFGNPKNLLAHDI PDRENADLIMVGA 144 

Query: 132 TGLNTFERFXIGSSSEYILRHAKVDLLIVRDPNKTM 167 

TGLNTFER IGSSSEYI+RHAK+DLL+VRD KT+ 
Sbjct: 145 TGJOTFERLLIGSSSEYIMRHAKIDLLVVRDSTKTL 180 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1177 

A DNA sequence (GBSxl253) was identified in S.agalactiae <SEQ ID 3659> which encodes the amino 
acid sequence <SEQ ID 3660>. This protein is predicted to be aspartate aminotransferase (aspC). Analysis 
of this protein sequence reveals the following: 



Final Results 



bacterial cytoplasm Certainty=0. 1296 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Possible site: 47 

>» Seems to have no N-terminal signal sequence 
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Final Results 



bacterial cytoplasm 
bacterial membrane 
bacterial outside 



Certainty^O. 2803 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC21948 GB:U32714 aminotransferase [Haemophilus influenzae Rd] 
Identities = 142/212 (66%) , Positives = 181/212 (84%) , Gaps = 1/212 (0%) 

Query: 1 MKIFDKSMKLEHVAYDIRGPVLEEADRMRANGEKILRLNTGNPAAFGFKAPDEVIRDLIT 60 

M++F KB KLEHV YDIRGPV +EA R+ G KIL+LN GNPA FGFEAPDE++ D++ 
Sbjct: 1 MRLFPKSDKlEHVCYDIRGPvHKIALRLEEEGNKILKLNIGNPAPFGFEAPDEILVDVLR 60 

Query: 61 NARESEGYSDSKG1FSARKAVMQYYQLQNI-HVDMDDIYIVNGVSEGISMSMQALLDNDD 119 

N ++GY DSKG++SARKA++QYYQ + I ++D+YI NGVSE I+M+MQALL++ D 
Sbjct: 61 NLPSAQGYCDSKGLYSARKAIVQYYQSKGILGATVNDVYIGNGVSELITMAMQALIiNDGD 120 

Query: 120 EVLVPMPDYPLWTACVSLAGGNAVHYICDEEANWYPDIDDIKSKITSKTKAIVLINPNNP 179 

EVLVPMPDYPLWTA V+L+GG AVHY+CDE+ANW+P IDDIK+K+ +KTKAIV+ INPNNP 
Sbjct: 121 EVLVPMPDYPLWTAAVTLSGGKAVHYLCDEDANWFPTIDDIKAKVNAKTKAIVIINPNNP 180 

Query: 180 TGAVYPRE ILQE IVD IARQNDLI I FSDEVYDR 211 

TGAVY +E+LQEIV+ IARQN+LI I F+DE+YD+ 
Sbjct: 181 TGAVYSKELLQEIVEIARQNNLIIFADEIYDK 212 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3661> which encodes the amino acid 
sequence <SEQ ID 3662>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

>>> Seems to have no N-terminal signal sequence 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 170/212 (80%) , Positives = 193/212 (90%) , Gaps = 1/212 (0%) 



Query: 


1 


MKIFDKSMK1EHVAYDIRGPVLEEADRMRANGEKILRLNTGNPAAFGFEAPDEVIRDLIT 


60 






MKI +KS KLEHVAYDIRGPVL+EA+RM A+GEKILRLNTGNPAAFGFEAPDEVIRDLI 




Sb j ct : 


13 


MKIIEKSSKLEHVAYDIRGPVLDFJU^IASGEKILRLNTGNPAAFGFEAPDEVIRDLIV 


72 


Query: 


61 


NARESEGYSDSKGI FSARKAVMQYYQLQNI - HVDMDDI YI VNGVSEGI SMSMQALLDNDD 


119 






NAR SEGYSDSKGIFSARKA+MQY QL+ VD++DIY+ NGVSE IS+S+QALLDN D 




Sb j ct : 


73 


NARLSEGYSDSKGIFSARKAIMQYCQLKGFPDVDIEDIYLGNGVSELISISLQALLDNGD 132 


Query: 


120 


EVLVPMPDYPLOTACVSLAGGNAVHYICDEEANWYPDIDDIKSKITSKTKAIVL INPNNP 


179 






EVLVPMPDYPLWTACVSL GG AVHY+CDEEA WYPDI DIKSKITS+TKAIV+ INPNNP 




Sb j ct : 


133 


EVLVPMPDYPLWTACVSLGGGKAVHYLCDEEAGWYPDIADIKSKITSRTKAIVVINPNNP 


192 


Query: 


180 


TGAVYPREILQEIVD IARQNDLI IFSDEVYDR 211 








TGA+YP+EIL++IV +AR++ LI I F+DE+YDR 




Sbjct: 


193 


TGALYPKEILEDIVALAREHQLIIFADEIYDR 224 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1178 

A DNA sequence (GBSxl254) was identified in S.agalactiae <SEQ ID 3663> which encodes the amino 
acid sequence <SEQ ID 3664>. Analysis of this protein sequence reveals the following: 



Final Results 



bacterial cytoplasm Certainty=0 .2936 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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Possible site: 60 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood =-14.75 Transmembrane 38 - 54 ( 29 - 60) 



5 Final Results 

bacterial membrane Certainty=0 . 6901 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

10 A related GBS nucleic acid sequence <SEQ ID 9389> which encodes amino acid sequence <SEQ ID 9390> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3665> which encodes the amino acid 
sequence <SEQ ID 3666>. Analysis of this protein sequence reveals the following: 

15 Possible site: 43 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-15.97 Transmembrane 35 - 51 ( 25 - 58) 

Final Results 

20 bacterial membrane Certainty=0 . 7389 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
25 An alignment of the GAS and GBS proteins is shown below. 

Identities = 51/87 (58%) , Positives = 63/87 (71%) , Gaps = 7/87 (8%) 

Query: 1 MAKKPWEKKVVENNSHRKDKITRTSRGVVSSTPWITAFLSAFFVIWAILFIVFYTSNRG 60 
MAK+PWE+K+V++ + TR SR STPW+TA LS FFVI+VAILFI FYTSN G 
30 Sbjct: 1 MAKEPWEEKIVDDTIGTR TRKSRNAFISTPWLTALLSVFFVI IVAILFIFFYTSNSG 57 

Query: 61 EDRAKETSGFYGASSQKVNSSKTKKAS 87 

+R ET+GFYGAS+ K KT+KAS 
Sbjct: 58 SNRQAETNGFYGASTHK KTRKAS 80 

35 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1179 

A DNA sequence (GBSxl255) was identified in S.agalactiae <SEQ ID 3667> which encodes the amino 
40 acid sequence <SEQ ID 3668>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

»> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0. 0815 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

50 A related DNA sequence was identified in S.pyogenes <SEQ ID 3669> which encodes the amino acid 
sequence <SEQ ID 3670>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 . 0107 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

5 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 43/64 (67%) , Positives = 53/64 (82%) 

10 Query: 1 MKVALI PEKCIACGLCQTYSNI FDYQDDGIVKFSDTDNLEKEI PSSDQDTVLAVKSCPTK 60 

MKV++IPEKCIACGLCQTYS++FDY D+GIV FS + +1 SD+D +IAVKSCPTK 
Sbjct: 1 MKVSIIPEKCIACGLCQTYSSLFDYHDNGIVTFSSSSETSQSICPSDKDAIIAVKSCPTK 60 

Query: 61 ALTI 64 
15 ALT+ 

Sbjct: 61 ALTL 64 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

20 Example 1180 

A DNA sequence (GBSxl256) was identified in S.agalactiae <SEQ ID 3671> which encodes the amino 
acid sequence <SEQ ID 3672>. Analysis of this protein sequence reveals the following: 
Possible site: 28 

>» Seems to have a cleavable N-term signal seq. 
25 INTEGRAL Likelihood =-10.61 Transmembrane 47 - 63 ( 41 - 69) 

Final Results 

bacterial membrane — Certainty=0 . 5246 (Affirmative) < suco 

bacterial outside — Certainty=0.0000 (Not Clear) < suco 

30 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



35 



>GP:AAC36851 GB:L23802 pore-forming peptide [Enterococcus faecalis] 
Identities = 42/130 (32%) , Positives = 63/130 (48%) , Gaps = 9/130 (6%) 

Query: 7 KIRYHWQPELSWAI IYWS IAIAPI FIGLSLLYERTE 1 PSQVFVLFAI FIVLVGIGFH 63 

K +++WQPEL+ I IYWS +FI L L E I+VVF+FL G 

Sbjct: 3 KQKFYWQPELASTIIYWSCTFCILFISLILALENNGPYLISNLVMVPFFVFAYL---GIA 59 

40 Query: 64 RYFVIEEDGYLRIVSFNFLRRTKFPIEDIAKIEVTKSSVTIKFNNNHE- -RIFYMRKWPK 121 

RF+E L+ +R+ P+IK+ +S+I+ E ++F M+K 
Sbjct: 60 RSFM^TETS-LIWDVLWFRKKALPLSQIEKVTYNEKSIEIFSSEFKEGSKVFLMKKKTD 118 

Query: 122 KYFLDALAIE 131 
45 FL+AL 1+ 

Sbjct: 119 SLFLEALKIK 128 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3673> which encodes the amino acid 
sequence <SEQ ID 3674>. Analysis of this protein sequence reveals the following: 

50 Possible site: 28 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.87 Transmembrane 47 - 63 ( 41 - 69) 
INTEGRAL Likelihood = -3.35 Transmembrane 20 - 36 ( 18 - 37) 

55 Final Results 

bacterial membrane Certainty=0 .4949 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:AAC36851 GB:L23802 pore-forming peptide [Enterococcus faecalis] 
Identities = 42/130 (32%), Positives = 70/130 (53%), Gaps = 12/130 (9%) 

Query: 7 KIRYHWQPELSWS I IYWSIAFAPI FVGLSLLYERTE 1 PSRVFILFAI FAVLVGIGLH 63 

K +++WQPEL+ +IIYWS F +F+ L L E I+V+F +FA L G+ 
Sbjct: 3 KQKFYWQPELASTIIYWSCTFCILFISLILALENNGPYLISNLVMVPFFVFAYL GIA 59 

Query: 64 RYF-IIENNGILRIVSFKLFGPRKLLISTITKIEVTKSTLCIi HVEDKSYLFYMRKWP 119 

R F + E + I+R V + F + It +S I K+ + ++ + ++ S +F M+K 

Sbjct: 60 RSFNMTETSLI VRDVLW- -FRKKALPLSQIEKVTYNEKSIEIFSSEFKEGSKVFLMKKKT 117 

Query: 120 KKYFLDALAV 129 

FL+AL + 
Sbjct: 118 DSLFLEALKI 127 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 115/162 (70%), Positives = 132/162 (80%), Gaps = 1/162" (0%) 

Query: 1 MIKLFGKIRYHWQPELSWAIIYWSIAIAPIFIGLSLLYERTEIPSQVFVLFAIFIVLVGI 60 

MIKLFGKIRYHWQPELSW+IIYWSIA APIF+GLSLLYERTEIPS+VF+LFAIF VLVGI 
Sbjct: 1 MIKLFGKIRYHWQPELSWSIIYWSIAFAPIBVGLSLLYERTEIPSRVFILFAIFAVLVGI 60 

Query: 61 GFHRYFVIEEDGYLRIVSFNFLRRTKFPIEDIAKIEVTKSSVTIKFNNNHERIFYMRKWP 120 

G HRYF+IE +G LRIVSF K I I KIEVTKS++ + + +FYMRKWP 

Sbjct: 61 GLHRYFIIENNGILRIVSFKLFGPRKLLISTITKIEVTKSTLCLHVEDK-SYLFYMRKWP 119 

Query: 121 KKYFLDAIAIEPTFKGEVELLDNLIKMDYFECYRYDKKALTK 162 

KKYFLDALA+ P F+GEV L DN IK+DYFE Y++DKKALT+ 
Sbjct: 120 KKYFLDALAVNPYFQGE VTLSDNFI KLDYFEvYQHDKKftLTR 161 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1181 

A DNA sequence (GBSxl257) was identified in S.agalactiae <SEQ ID 3675> which encodes the amino 
acid sequence <SEQ ID 3676>. This protein is predicted to be peptidase t (pepT). Analysis of this protein 
sequence reveals the following: 

Possible site: 49 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 2913 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:ARA20627 GB:L27596 tripeptidase [Lactococcus lactis] 
Identities = 274/406 (67%), Positives = 334/406 (81%), Gaps = 4/406 (0%) 

Query: 1 MSYEKLLERFLTWKINTRSNPNSTQTPTTQSQvDFALTVLKPEMEAIGLKDVHYLPSNG 60 

M YEKLL RFL YVK+NTRS+ NST TP+TQ+ V+FA + +M+A+GLKDVHYL SNG 
Sbjct: 1 MKYEKLLPRFLEYVKVNTRSDENSTTTPSTQALVEFAHK-MGEDMKALGLKDVHYLESNG 59 

Query: 61 YLVGTLPATSDRLRHKIGFISHMDTADFNAENITPQI VDYKJ3GD- - IELGDSGYILSPKD 118 

Y++GT+PA +D+ KIG ++H+DTADFNAE + PQI++ G+ I+LGD+ + L PKD 
Sbjct: 60 WIGTIPANTDKKVRKIGLIAHLDTADFNAEGVNPQILENYDGESVIQLGDTEFTLDPKD 119 



Query: 119 FPNIiNNYHGQTLITTDGKTLLGADDKSGIAEIMTAMEYIAS-HPEIEHCEIRVGFGPDEE 177 
FPNL NY GQTL+ TDG TLLG+DDKSG+AEIMT +YL + +P+ EH EIRVGFGPDEE 



10 



15 



WO 02/34771 PCT/GB01/04789 

-1324- 



IG+GADKFDV DFDVDFAYTVDGGPLGELQYETFSAAG + F+G+NVHPGTAKN M+NA 
IGVGADKFDVADFDVDFAYTVDGGPLGELQYETFSAAGAVIEFQGKNVHPGTAKNMMVNA 

LQLAMDFHSQLPENERPEQTDGYQGFYHLYDLSGTVDQAKSSYIIRDFEEVDFIjKRKHLA 
LQLA+D+H+ LPE +RPE+T+G +GF+HL L GT ++A++ YIIRD EE F +RK L 
I^IAIDYHNALPEFDRPEKTEGREGFFHLLKUJGTPEEARAQYIIRDHEEGKFNERKALM 

QDIADNMNEALQSERVKVKLYDQYYNMKKVIEKDMTPINIAKEVMEELDIKPIIEPIRGG 
Q+IAD MN L RVK + DQYYMM ++IEKDM+ I+IAK+ ME LDI PIIEPIRGG 
QEIADKMNAELGQNRVKPVIKDQYYNMAQIIEKDMSIIDIAKKAMENLDIAPIIEPIRGG 

TDGSKISFMGIPTPNLFAGGENMHGRFEFVSLQTMEKAVDVILGIV 403 
TDGSKISFMG+PTPNLFAGGENMHGRFEFVS+QTMEKAVD +L 1 + 



Sb j Ct : 


120 


Query: 


178 


Sbjct: 


180 


Query: 


238 


Sbjct: 


240 


Query: 


298 


Sb j ct : 


300 


Query: 


358 


Sbjct: 


360 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3677> which encodes the amino acid 
20 sequence <SEQ ID 3678>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

>» Seems to have no N- terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0. 2938 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

30 Identities = 305/406 (75%) , Positives = 352/406 (86%) , Gaps = 1/406 (0%) 

Query: 1 MSYEKLLERFLTYWINTRSNPNSTQTPTTQSQVDFALTVLKPEMEAIGLKDVHYLPSNG 60 

M Y+ LL+RF+ YVK+NTRS P+S TP+T+SQ FALT+LKPEMEAIGL+DVHY P NG 
Sbjct: 5 MKYDNLLDRFIKYVKVNTRSVPDSETTPSTESQEAFALTILKPEMEAIGLQDVHYNPVNG 64 

35 

Query: 61 YLVGTLPATSDRLRHKIGFISHMDTADFNAENITPQ1VD-YKGGDIELGDSGYILSPKDF 119 

YL+GTLPA + L KIGFI+HMDTADFNAEN+ PQI+D Y+GGDI LG S Y L PK F 
Sbjct: 65 YLIGTLPANNPTLTRKIGFIAHMDTADFNAENVNPQIIDNYQGGDITLGSSNYKLDPKAF 124 

40 Query: 120 PNtNNYHGQTLITTDGKTLLGADDKSGIAEIMTAMEYLASHPEIEHCEIRVGFGPDEEIG 179 

PNUTOIY GQTLITTDG TLLGADDKSGIAEIMTA+E+L S P+IEHC+I+V FGPDEEIG 
Sbjct: 125 PNLNNYIGQTLITTDGTTLLGADDKSGIAEIMTAIEFLTSQPQIEHCDIKVAFGPDEEIG 184 

Query: 180 IGADKFDVKDFDVDFAYTVDGGPLGELQYETFSAAGLELTFEGRNVHPGTAKNQMINALQ 239 
45 ' +GADKF+V DF+VDFAYT+DGGPLGELQYETFSAA LE+TF GRNVHPGTAK+QMINAL+ 

Sbjct: 185 VGADKFEVADFEVDFAYTMDGGPLGELQYETFSAAALEVTFLGRNVHPGTAKDQMINALE 244 

Query: 240 LAMDFHSQLPENERPEQTDGYQGFYHLYDLSGTVnCAKSSYIIRDFEEVDFLKRKHLAQD 299 
LA+DFH +LP +RPE TDGYQGFYHL L+GTV++A++SYI IRDFEE F RK ++ 
50 Sbjct: 245 1AIDFHEKLPAKDRPEYTDGYQGFYHLTGLTGTVEEARASYIIRDFEEASFEARKVKVEN 304 

Query: 300 IADNMNEALQSERVKVKLYDQYYNMKOT^ 359 

IA +MN L> ++RV V+L DQYYNMKKVIEKDMT I +AKEVMEEL IKP+IEPIRGGTD 
Sbjct: 305 IAQSMNAQLGTKRVl,vEIOTQYYNMKKVIEKDMTAIE3^AKEVMEELAIKPVIEPIRGGTD 364 

55 

Query: 360 GSKI S FMGI PTPNLFAGGENMHGRFEFVSLQTMEKAVDVILGI VAK 405 

GSKI S FMGI PTPN+ FAGGENMHGRFEFVSLQTME+AVDVI +G+V K 
Sbjct: 365 GSKISFMGIPTPNIFAGGENMHGRFEFVSLQTMERAVDVIIGLVCK 410 

60 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1182 

A DNA sequence (GBSxl258) was identified in S.agalactiae <SEQ ID 3679> which encodes the amino 
acid sequence <SEQ ID 3680>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

»> Seems to have no N- terminal signal sequence 



INTEGRAL 


Likelihood 




12 


26 


Transmembrane 


481 




497 


{ 477 




508) 


INTEGRAL 


Likelihood 




-9 


45 


Transmembrane 


510 




526 


( 506 




534) 


INTEGRAL 


Likelihood 




-7 


96 


Transmembrane 


316 




332 


( 310 




334) 


INTEGRAL 


Likelihood 




-7 


54 


Transmembrane 


354 




370 


{ 351 




373) 


INTEGRAL 


Likelihood 




-7 


11 


Transmembrane 


385 




401 


( 383 




409) 


INTEGRAL 


Likelihood 




-6 


58 


Transmembrane 


215 




231 


( 211 




233) 


INTEGRAL 


Likelihood 




-6.48 


Transmembrane 


71 




87 


( 69 




91) 


INTEGRAL 


Likelihood 




-6 


32 


Transmembrane 


110 




126 


[ 106 




133) 


INTEGRAL 


Likelihood 




-5 


10 


Transmembrane 


446 




462 


( 443 




465) 


INTEGRAL 


Likelihood 




-3 


29 


Transmembrane 


418 




434 


( 418 




435) 


INTEGRAL 


Likelihood 




-2 


55 


Transmembrane 


263 




279 


( 263 




279) 


INTEGRAL 


Likelihood 




-2 


02 


Transmembrane 


142 




158 


( 141 




159) 


INTEGRAL 


Likelihood 




-1 


70 


Transmembrane 


184 




200 


( 184 




200) 



Final Results 

bacterial membrane Certainty=0 . 5904 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8747> which encodes amino acid sequence <SEQ ID 8748> 

was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 
McG: Discrim Score: -10.58 
GvH: Signal Score (-7.5): -1.1 

Possible site: 32 
>» Seems to have no N-terminal signal sequence 
ALOM program count: 13 value: -12.26 threshold: 0.0 
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modified ALOM score: 2.95 



*** Reasoning Step: 3 

Final Results 

bacterial membrane 

bacterial outside 

bacterial cytoplasm 



Certainty=0. 5904 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC00276 GB:AF008220 YtgP [Bacillus subtilis] 
Identities = 178/545 (32%) , Positives = 302/545 (54%) , Gaps = 26/545 (4%) 

Query: 24 QMVKGTAWLTAGNFISRLLGAIYIIPWYAWMGKHAAEANALFGMGYEIYALFLLISTVGI 83 

++++GT LT G +ISR+LG +Y+IP+ +G A ALF GY Y LFL I+T+G 
Sbjct: 4 KLLRGTFVLTLGTYISRILGMVYLIPFSIMVG ATGGALFQYGYNQYTLFLNIATMGF 60 
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Query: 84 PVAVAKQVSKYNTLGKEEMSIYLVRKILQFMLILGGIFALIMYIGSPLFASLSKGGQE-- 141 

P AV+K VSKYN+ G E S +++ + ML+ G I I+Y+ +P+FA +S GG++ 
Sbjct: 61 PAA.VSKFVSKYNSKGDYETSRKMLKAGMSVMLVTGMIAFFILYLSAPMFAEISLGGKDNN 120 

5 Query: 142 LVPILRSLTLAVLVFPSMSVLRGFFQGFKNLKPYAISQVAEQIIRVIWMLLTAF 195 

+V ++R ++LA+LV P MS++RGFFQG + P A+SQV EQI+R+I++L F 
Sbjct: 121 GLTIDHWYVIRMVSLALLWPIMSLVRGFFQGHQMMGPTAVSQWEQIVRIIFLLSATF 180 

Query: 196 YIMRLGSGDYIAAVTQSTFAAFVGMFASIAVLLYFLW--RYNMLSALIGKTPKHIKLDTK 253 
10 I+++ +G + AV +TFAA +G F + V+LY W R L A++ T L K 

Sbjct: 181 LILKVFNGGLVIAVGYATFAALIGAFGGL-VVLYIYWNKRKGSLLAMMPNTGPTANLSYK 239 



15 



20 



25 



Query: 254 EILIETIKEAIPFIITGAAIQIFKLIDQFSFGNTM--ALFTNYSSEELRVMFAYFSSNPG 311 

++ E A P++ G AI ++ ID +F MA S + L ++ Y 
Sbjct: 240 KMFFELFS YAAPYVFVGLAI PLYNYIDTNT FNKAM I EAGHQAI SQDMLAI LTLYVQ 295 

Query: 312 KOTMILIAVATAIAGVGIPLLTENFVKNDKZAAARLVVNNLQMLLMFLLPAVAGSVILAK 371 

K+ MI +++ATA IP +TE+F + K + + +Q +L ++PAV G +L+ 

Sbjct: 296 KLVMIPVSLATAFGLTLIPTITESFTSGNYKLMQQINQTMQTILFLIIPAWGISLLS^ 355 

Query: 372 PLYTVFYGL PQGQALGLFVISLIQTIILSIYTVLAPMLQALFENRKAIIYFLYGLV 427 

P YT FYG P+ A L S + 1+ S++TV A +LQ + + + A++ + G+V 

Sbjct: 356 PTYTFFYGSESLHPELGANILLWYSPV-AILFSLFTVNAAILQGINKQKFAWSLVIGW 414 

Query: 428 AKVILQLPSIFLFHAYGPLFSTTVALCIPVILMYLKIHEITGFKRQAIRRTSALVLILTL 487 

K++L +P I L A G + +T + ++ ++ I G+ + + + + L+L+L+ 

Sbjct: 415 IKLVLWPLIKLMQADGAILATALGYIASLLYGFIMIKRHAGYSYKILVKRTVLMLVLSA 474 



Query: 488 LMSFIISMIIWLMNLVI-VPDSRLVSLVYIIVIGAIGLGVYGFMALATHLLDKMIGSRAQ 546 
30 +M + ++ W++ I D ++ + + +++ A+G VY + L K++G R 

Sbjct: 475 IMGIAVKIVQWVLGFFISYQDGQMQAAIVWIAAAVGGAVYLYCGYRLGFLQKILGRRLP 534 



35 



Query: 547 DLRRK 551 
RK 

Sbjct: 535 GFFRK 539 



40 



45. 



50 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3681> which encodes the amino acid 
sequence <SEQ ID 3682>. Analysis of this protein sequence reveals the following: 



Possible site: 49 
>» Seems to have no N-terminal signal sequence 
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Likelihood 
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Likelihood 
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Transmembrane 
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Likelihood 
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Transmembrane 
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Likelihood 
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Likelihood 
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Likelihood 
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55 



Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



-- Certainty=0. 4439 (Affirmative) < suco 
-- Certainty=0. 0000 (Not Clear) < suco 
-- Certainty^O.OOOO (Not Clear) < suco 



60 



The protein has homology with the following sequences in the databases: 

>GP:AAC00276 GB:AF008220 YtgP [Bacillus subtilis] 
Identities = 169/536 (31%) , Positives = 295/536 (54%) , Gaps = 24/536 (4%) 



Query: 14 MVCGAAWSTAGNFISRLLGVLYIIPWYIWGQYAIQANALFNMGYNVYAYFLLISTTGLN 73 

+++G T G +ISR+LG++Y+IP+ I +G ALF GYN Y FL I+T G 

Sbjct: 5 LLRGTFVLTLGTYISRILGMVYLIPFSIMVGA TGGRLFQYGYNQYTLFLNIATMGFP 61 
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Query: 74 VAIAKQVAKYNSMGQTEHSYQLIRSTLKLMIK3LGLIFSAIMYLGSPLFASLS-GGDDT-- 130 

A++K V+KYNS G E S +++++ + +ML G+I I+YL +P+FA +S GG D 
Sbjct: 62 AAVSKWSKYNSKGDYETSRKMLKAGMSVMLVTGMIAFFILYLSAPMFAEISLGGKDNNG 121 

5 Query: 131 LVPIMHSLSLAVFIFPVMSVIRGIFQGHNNIKPYAVSQIAEQLIRVIWMLLTTFF 185 

+V ++ +SLA+ + P+MS++RG FQGH + P AVSQ+ EQ++R+I++L TF 
Sbjct: 122 LTIDHWYVIRMVSLALLWPIMSLVRGFFQGHQMMGPTAVSQWEQIVRIIFLLSATFL 181 

Query: 186 IMKLGSGDYASAVTQSTFAAFIGMVASMGVLGYYLW--RQGLLAA.IFSKPDHTVSIDIKG 243 
10 I+K+ +G AV +TFAA IG + VL Y W ++G L A+ T ++ K 

Sbjct: 182 ILKVFNGGLVIAVGYATFAALIGAFGGLWL-YIYWNKRKGSLLAMMPNTGPTANLSYKK 240 

Query: 244 LLLETLKESIPFIVTGSAIQAFQLIDQWTFVNTMTLFTDYSRSQ- -LLVLFGYFNANPAK 301 
+ E + P++ G AI + ID TF M + SQ L +L Y K 
15 Sbjct: 241 MFFELFSYAAPWFVG1AIPLYNYIDTNTFNKAMIEAGHQAISQDMLAILTLYVQ K 296 

Query: 302 ITMVLIAVAASIGGVGIALLTENYVKKDMKAAARLIINNIEMLVMFLLPALTGAIILARP 361 

+ M+ +++A + G I +TE++ + K +1 ++ ++ ++PA+ G +L+ P 
Sbjct: 297 LVMIPVSIATAFGLTLIPTITESFTSGNYKLLNQQINQTMQTILFLIIPAWGISLLSGP 356 

20 

Query: 362 LYSVFYGASE ERAIHLFVAVLFQTLLLALYTLFSPMLQALFENRKAIYYFAYGILIK 418 

Y+ FYG+ E ++ + +L +L+T+ + +LQ + + + A+ G++IK 

Sbjct: 357 TYTFFYGSESLHPELGANILLWYSPVAILFSLFTVNAAILQGINKQKFAWSLVIGWIK 416 

25 Query: 419 LVLQIPLIYLLHAYGPLIATTIALWPIYLMYRRLYQVTHFNRKLLQKRLLLTLIETLLM 478 

LVL +PLI L+ A G +LAT + + + + + + ++ K+L KR +L L+ + +M 
Sbjct: 417 LVLNVPLIKLMQADGAIIATALGYIASLLYGFIMIKRHAGYSYKILVKRTVLMLVLSAIM 476 

Query: 479 GLWFVANWLLGYAFK- PTGRLTSLLYLLI IGGLGMTVYTALTLLTHQLDKLIGSK 533 
30 G+ V + W+LG+ G++ + + ++I +G VY L K++G + 

Sbjct: 477 GIAVKIVQWVLGFFISYQDGQMQAAIVWIAAAVGGAvYLYCGYRLGFLQKILGRR 532 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 320/541 (59%) , Positives = 431/541 (79%) 

35 

Query: 12 MSQKTTKVSQQEQMWGTAWLTAGNFISRLLGAIYIIPWYAWMGKHAREANALFGMGYEI 71 

MS + +++Q+E MV+G AW TAGNFISRLLG +YIIPWY WMG++A +ANALF MGY + 
Sbjct: 1 MSTEKKQLTQEELMVQGARWSTAGNFISRLLGVLYIIPWYIWMGQYAIQANALFNMGYNV 60 

40 Query: 72 YALFLLISTVGI PVAVAKQVSKYNTLGKEEMS I YLVRKILQFMLILGGI FALIMYIGSPL 131 

YA FLLIST G+ VA+AKQV+KYN++G+ E S L+R L+ ML LG IF+ IMY+GSPL 
Sbjct: 61 YAYFLLI STTGLNVAIAKQVAKYNSMGQTEHSYQLIRSTLKLMLGLGLI FSAI MYLGS PL 120 

Query: 132 FASLSKGGQELVPILRSLTLAVLVFPSMSVLRGFFQGFNNLKPYAI SQVAEQI IRVIWML 191 
45 FASLS G LVPI+ SL+LAV +FP MSV+RG FQG NN+KPYA+SQ+AEQ+ IRVIWML 

Sbjct: 121 FASLSGGDDTLVPIMHSLSLAVFIFPVMSVIRGIFQGHNNIKPYAVSQIAEQLIRVIWML 180 

Query: 192 LTAFYIMRLGSGDYIAAVTQSTFAAFVGMFASIAVLLYFLWRYNMLSALIGKTPKHIKLD 251 
LT F+IM+LGSGDY +AVTQSTFAAF+GM AS+ VL Y+LW+ +L+A+ K + +D 
50 Sbjct: 181 LTTFFIMKLGSGDYASAVTQSTFAAFIGMVASMGVLGYYLWKQGLIAAIFSKPDHTVSID 240 

Query: 252 TKEILIETIKEAIPFIITGAAIQIFKLIDQFSFGNTMALFTNYSSEELRVMFAYFSSNPG 311 

K +L+ET+KE+ 1 PFI+TG+AIQ F+LIDQ++F NTM LFT+YS +L V+F YF++NP 
Sbjct: 241 IKGLLLETLKESIPFIVTGSAIQAFQLIDQWTFVNTMTLFTDYSRSQLLVLFGYFNANPA 300 

55 

Query: 312 KOTMILIAVATAIAGVGIPLLTENFVKiroKKAAARLvVNNLQMLLMFLLPAVAGSVILAK 371 

K+TM+LIAVA +1 GVGI LLTEN+VK D KAAARL + +NN+ +ML+MFLLPA+ G++ILA+ 
Sbjct: 301 KITMVLIAVAASIGGVGIALLTERYVKKDMKAAARLIINNIEMLVMFLLPALTGAIILAR 360 

60 Query: 372 PLYTVFYGLPQGQALGLWISLIQTIILSIYTVLAPMLQALFENRKAIIYFLYGLVAKVI 431 

PLY+VFYG + +A+ LFV L QT++L++YT+ +PMLQALFENRKAI YF YG++ K++ 
Sbjct: 361 PLYSVFYGASEERAIHLFVAVLFQTLLLALYTLFSPMLQALFENRKAIYYFAYGILIKLV 420 

Query: 432 LQLPS I FLFHAYGPLFSTTVALCI PVILMYLKIHEITGFKRQAIRRTSALVLILTLLMS F 491 
65 LQ+P I+L HAYGPL +TT+AL +P+ LMY +++++T F R+ +++ L LI TLLM 

Sbjct: 421 LQIPLIYLLHAYGPLLATTIALWPIYLMYRRLYQVTHFNRKLLQKRLLLTLIETLLMGL 480 
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Query: 492 IISMIIWLMNLVIVPDSRLVSLVYIIVIGAIGLGVYGFMALATHLLDKMIGSRAQDLRRKL 552 

++ + WL+ P RL SL+Y+++IG +G+ VY + L TH LDK+IGS+A LR+KL 

Sbjct: 481 WFVANWLLGYAFKPTGRLTSLLYLLIIGGLGMTVYTALTLLTHQLDKLIGSKASRLRQKL 541 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1183 

A DNA sequence (GBSxl259) was identified in S.agalactiae <SEQ ID 3683> which encodes the amino 
acid sequence <SEQ ID 3684>. Analysis of this protein sequence reveals the following: 

Possible site: 40 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4104 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06290 GB:AP001515 UDP-N-acetylmuramoylalanyl-D-glutamyl-2, 
6-diaminopimelate ligase [Bacillus halodurans] 
Identities = 153/468 (32%) , Positives = 237/468 (49%) , Gaps = 23/468 (4%) 



Query: 


33 


NVTFNALSYDSRQISSDTLFFA-KGATFK-KEYLDSAITAGLSFYVSETDYGADIPVILV 


90 






N +++ DSR++ LFF KG T +Y A++ G VSE +PV++V 




Sb j ct : 


21 


NPDIHSIHMDSREVvEGGLFFCIKGYTVDGHDYAQQAVSNGAVAWSERPLELSVPVVVV 


80 


Query: 


91 


NDIKKAMSLISMSFYNNPQNKLKLLAFTGTKGKTTAAYFAYHMLKVNHR-PAMLSTMNTT 


149 






D ++AM+ ++ FY P N L+L+ TGT GKTT + +++ + ++ TM T 




Sb j ct : 


81 


RDSRRAMAQVATKFYGEPTNDLQLIGVTGTNGKTTITHblEKIMQDQGKMTGLIGTMYTK 


140 


Query: 


150 


LDGKSFFKSHLTTPESLDLFRMMATAVENQMTHLIMEVSSQAYLTKRVYGLTFDVGVFLN 


209 






+ G ++ TTPESIi L R A ++ +T +MEVSS A + RV G FDV VF N 




Sb j ct : 


141 


I - GHELKETKNTTPESL VLQRTFADMKKSGVTTAMMEVSSHALQSGRVRGCDFDVAVFSN 


199 


Query: 


210 


ISPDHIGPIEHPTFEDYFFHKRLLME NSNAVWN SQMDHFNIVKEQVEYI 


259 






++PDH+ H T E Y F K LL • V+N + D + QV 




Sb j ct : 


200 


LTPDHLD--YHGTMERYKFAKGLLFAQLGNTYQGKVAVIjNADDPASADFAEMTIAQVvTY 


257 


Query: 


260 


PHDFYGDY-SENVITESKAFSFHVKGKLEN-TYDIKLIGKFNQENAIAAGLACLRLGVSI 


317 






+ D+ +ENV S +F + E I LIGKF+ N +AA A GV + 




Sb j ct : 


258 


GIFJvIEADFQAEIWRITSTGTTFEI^FEEFIMELSIHLIGKFSVYNVIjAAAAAAYVSGVPL 


317 


Query: 


318 


EDIKNGIAQTT-VPGRMEVLTQTNGAJCIFVDYAHNGDSLKKLLAVVEEHQKGDIILvIiGA 


376 






++IK + + V GR E + + VDYAH DSL+ +L V E KGD+ +V+G 




Sb j ct : 


318 


QEIKKSLEEVKGVAGRFETVKHDQPFWIVDYAHTPDSLENVLKTVGELAKGDVRVWGC 


377 


Query: 


377 


PGNKGQSRRKDFGDVINQHPNLQVILTADDPNFEDPLVISQEIASHINRPVTIII-DREE 


435 






G++ +++R ++ N Q I T+D+P E+P+ I +++ ++I DR+E 




Sbj ct : 


378 


GGDRDKTKRPVMAEIATTFAN-QAIFTSDNPRSEEPMDILRDMEQGAKGDSYLMIEDRKE 


436 


Query: 


436 


AIANASTLTNCKLDAI I IAGKGADAYQI IKGNRDNYSGDLEVAKKYLK 483 








AI A L + D I+IAGKG + YQ + ++ D VA++ +K 




Sbj ct : 


437 


AIFKAIELAK-EDDIIVIAGKGHETYQQFRDRTIDFD-DRIVAQQAIK 482 





A related DNA sequence was identified in S.pyogenes <SEQ ID 3685> which encodes the amino acid 
sequence <SEQ ID 3686>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

>» Seems to have no N-terminal signal sequence 
Final Results 
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bacterial cytoplasm Certainty=0 . 4717 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

5 An alignment of the GAS and GBS proteins is shown below. 

Identities = 350/482 (72%) , Positives = 399/482 (82%) , Gaps = 1/482 (0%) 

Query: 1 MITIDKILEILKNDHNFREILFHEHYYYNWTQNVTENALSYDSRQISSDTLFFAKGATFK 60 
MITI+++L+ILK DHNFRE+L + Y+Y++ Q +F LSYDSRQ+ TLFFAKGATFK 
10 Sbjct: 1 MITIEQLLDILKKDHNFREVLDADGYHYHY-QGFSFERLSYDSRQVDGKTLFFAK6ATFK 59 

Query: 61 KEYLDSAITAGLSFYVSETDYGADIPVILVMDIKKAMSLISMSFYNNPQNKLKIiLAFTGT 120 

+YL AIT GL Y+SE DY IPV+LV DIKKAMSLI+M+FY NPQ KLKLLAFTGT 
Sbjct: 60 ADYLKFAITNGLQLYISEVDYELGIPVVI.TODIKKAMSLIAMAFYGNPQEKLKLLAFTGT 119 

15 

Query: 121 KGKTTAAYFAYHMLKVNHRPAMLSTMNTTLDGKSFFKSHLTTPESLDLFRMMATAVENQM 180 

KGKTTAAYFAYHMLK +++PAM STMNTTLDGK+FFKS LTTPESLDLF MMA V N M 
Sbjct: 120 KGKTTAAYFAYHMLKESYKPAMFSTMNTTLDGKTFFKSQLTTPESLDLFAMMAEOTTNGM 179 

20 Query: 181 THLIMEVSSQAYLTKRVYGLTFDVGVFLNISPDHIGPIEHPTFEDYFFHKRLLMENSNAV 240 

THLIMEVSSQAYL RVYGLTFDVGVFLNISPDHIGPIEHPTFEDYF+HKRLLMENS AV 
Sbjct: 180 THLIMEVSSQAYLVDRVYGLTFDVGVFIiNISPDHIGPIEHPTFEDYFYHKRLLMENSRAV 239 

Query: 241 WNSQMDHFNIVKEQVEYIPHDFYGDYSENVITESKAFSFHVKGKLENTYDIKLIGKFNQ 300 
25 V+NS MDHF+ + +QV H FYG S+N IT S+AFSF KG+L YDI+LIG FNQ 

Sbjct: 240 VINSGMDHFSFLADQVADQEHVFYGPLSDNQITTSQAFSFEAKGQLAGHYDIQLIGHFNQ 299 

Query: 301 ENAIAAGIACLRLGVSIEDIKNGIAQTTVPGRMEVLTQTNGAKIFvDYAHNGDSLKKLLA 360 
ENA+AAGIACLRLG S+ DI+ GIA+T VPGRMEVLT TN AK+FVDYAHNGDSL+KLL+ 
30 Sbjct: 300 ENAMAAGIACLRLGASLADIQKGIAKTRVPGRMEVLTMTNHAKVFVDYAHNGDSLEKLLS 359 

Query: 361 VVEEHQKGDIILVLGAPGNKGQSRRKDFGDVINQHPNLQVILTADDPNFEDPLVISQEIA 420 

WEEHQ G ++L+LGAPGNKG+SRR DFG VI+QHPNL VILTADDPNFEDP IS+EIA 
Sbjct: 360 VVEEHQTGKMLILGMGNKX3ESRRADFGRVIHQHPNLTVILTADDPNFEDPEDISKEIA 419 

35 

Query: 421 SHINRPVTIIIDREEAIANASTLTNCKLDAIIIAGRGADAYQIIKGNRDNYSGDLEVAKKYL 482 

SHI RPV II DRE+AI A +L DA+IIAGKGADAYQI+KG + Y+GDL +AK YL 

Sbjct: 420 SHIARPVEI ISDREQAIQKAMSLCQGAKDAVI IAGKGADAYQIVKGQQVAYAGDLAIAKHYL 481 

40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1184 

A DNA sequence (GBSxl260) was identified in S.agalactiae <SEQ ID 3687> which encodes the amino 
acid sequence <SEQ ID 3688>. Analysis of this protein sequence reveals the following: 

45 Possible site: 29 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .1421 (Affirmative) < suco 

50 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

55 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1185 

A DNA sequence (GBSxl261) was identified in S.agalactiae <SEQ ID 3689> which encodes the amino 
acid sequence <SEQ ID 3690>. This protein is predicted to be FhuA (fepC). Analysis of this protein 
sequence reveals the following: 

5 Possible site: 54 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2785 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9975> which encodes amino acid sequence <SEQ ID 9976> 
was also identified. 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF98153 GB:AF251216 FhuC [Staphylococcus aureus] 
Identities = 141/259 (54%) , Positives = 193/259 (74%) 

Query: 7 MSHIKAENIIVSYDQKEIINNLSLSItNQKITTIIGANGCGKSTLLKALTRIHKIKDGTI 66 
20 M+ + + + + Y UN L + I + K+T+IIG NGCGKSTLLKAL+R+ +K+G + 

Sbjct: 1 MNRLHGQQVKIGYGDNTIINKLDVEIPDGKVTSIIGPNGCGKSTLLKALSRLLAVKEGEV 60 

Query: 67 TIDGHDIAHLPTKEIAKKIALLPQVLEATEGITVYELISYGRFPHQKYLGNLTNDDRSKI 126 
+DG +1 TKEIAKKIA+LPQ E +G+TV EL+SYGRFPHQK G LT +D+ +1 
25 Sbjct: 61 FLDGENIHTQSTKEIAKKIAILPQSPEVADGLTVGELVSYGRFPHQKGFGRLTAEDKKEI 120 

Query: 127 HWAMEMTNVAQFANRDVDDLSGGQRQKVWIAMAIAQDTDTIFLDEPTTYLDMNHQLEVLE 186 

WAME+T F +R + +DLSGGQRQ+ VWIAMALAQ TD I FLDEPTTYLD+ HQLE+LE 
Sbjct: 121 DWAMEOTGTDTFRHRSINDLSGGQRQRWIAMALAQRTDIIFLDEPTTYLDICHQLEILE 180 

30 

Query: 187 LLKKLNDETQKTI IMVLHDLNLSARYSDYLVAMKTGKI I YEGSPSQIMTKDI I KDI FKID 246 

L++KLN E TI+MVLHD+N + R+SD+L+AMK G II GS ++T++I++ +F ID 
Sbjct: 181 LVQKLNQEQGCTIvMVLHDINQAIRFSDHLIAMKEGDIIATGSTEDVLTQEILEKVFNID 240 

35 Query: 247 AHIIQDPISKQPVLLSYQL 265 

+ +DP + +P+L++Y L 
Sbjct: 241 WLSKDPKTGKPLLVTYDL 259 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1929> which encodes the amino acid 
40 sequence <SEQ ID 1930>. Analysis of this protein sequence reveals the following: 

Possible site: 48 

>» Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0. 2970 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

50 Identities = 166/259 (64%) , Positives = 208/259 (80%) 

Query: 7 MSHIKAENIIVSYDQKEIINNLSLSIIJSrQKITTIIGANGCGKSTLLKALTRIHKIKDGTI 66 

M+ I AE++ ++Y+Q+ 11+ LS I KITTIIGANGCGKS+LLKALTR+ KG t 
Sbjct: 1 MTTISAEDLTIAYEQRTIIDKLSFYIPEGKITTIIGANGCGKSSLLKALTRLLPPKQGW 60 



55 



Query: 67 TIDGHDIAHLPTKEIAKKIALLPQVLEATEGITVYELISYGRFPHQKYLGNLTNDDRSKI 126 

++G +IA L TKE+AKK+ALLPQV EAT GITVYEL+SYGRFPHQ Y GNL+ D+ I 
Sbjct: 61 YMGQNIATLETKEVAKKIALLPQVQEATNGITVYELVSYGRFPHQSYFGNLSPADKKAI 120 
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Query: 127 HWAMEMTNVAQFANRDVDDLSGGQRQKWIAMRLAQDTDTIFLDEPTTYLDMlfflQLEVLE 186 

HWAM+ TNV +A++ VD LSGGQRQ+VW+AMALAQ TDTI FLDEPTTYLD+NHQLE+LE 
Sbjct: 121 HWAMQATNVMAYADQPVDALSGGQRQRVWLAMA^ 180 

Query: 187 LLKKIi^^^ETQKTII^WLHDIlNLSARYSDYLVAMKTGKIIYEGSPSQIMTKDIIKDIFKID 246 

L+K LN + KTI+MVLHDLNLSARYSD+L+AMK GKI Y G+ + +MT II+DIF+I 
Sbjct: 181 LWSmKDAGKTIVMVLHDLNLSARYSDHLIAMKHGKIHYTGTIADVMTSPIIQDIFQIK 240 

Query: 247 AHIIQDPISKQPVLLSYQL 265 

++ DPI P++L+YQL 
Sbjct: 241 PVLVDDPIHNCPIVLTYQL 259 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1186 

A DNA sequence (GBSxl262) was identified in S.agalactiae <SEQ ID 3691> which encodes the amino 
acid sequence <SEQ ID 3692>. This protein is predicted to be ferrichrome ABC transporter. Analysis of 
this protein sequence reveals the following: 

Possible site: 20 

»> Seems to have a cleavable N-term signal seq. 



Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm - — Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07609 GB:AP001520 ferrichrome ABC transporter 

(ferrichrome-binding protein) [Bacillus halodurans] 
Identities = 94/301 (31%) , Positives = 177/301 (58%) , Gaps = 11/301 (3%) 



Query: 


6 


I IVLTLLTFFLV- - - SCGQQTKQESTKTTISK- -MPKIEGFTYYGKIPENPKKVINFTYS 


60 






+++LT+L F L+ +CG T E S+ M E T ++P NP++V+ 




Sb j ct : 


7 


LLLLTMLLFALLWAACGSNTDAEQADELESEDGMITYESETGPIEVPANPQRW--ALG 


64 


Query: 


61 


YTGYLLKLGVNVSSYSLDLEKDSPVFGKQIiKEAKKLTADDTEAIAAQKPDLIMVFDQDPN 


120 






+TG +L L VNV K++P + + L++ +++ ++ E I PDLI+ + N 




Sb j ct : 


65 


FTGNILAIjDVNWGVDT-WSKNNPNYEQLLQDVTEVSEENLEQIMELDPDLIIAYSTVQN 


123 


Query: 


121 


INTLKKIAPTLVIKYGAQNYLDMMPALGKVFGKEKEANQWVSQWKTKTLAVKICDLHHILK 


180 






L++IAPT++ Y +YL+ +GK+ KE+EA WV +K + +++ + 




Sb j ct : 


124 


AEQLQEIAPTVLYTYNNLDYLEQHVEIGKLLI^EEAQAWVDDFKARAEQAGEEIKEKIG 


183 


Query: 


181 


PNTTFTI^FYDKNIYLYGNNFGRGGELIYDSLGYAAPEKVKKDVFKKGWFTVSQEAIGD 


240 






+ T ++++ ++ +Y++GNN+GRG E++Y ++ A PE+V++ G++ +S EA+ + 




Sb j ct : 


184 


EDAWSVIETFEDQLYVFGNNWGRGTEILYQTMDLAMPERTOEMALADGYYALSFEALPE 


243 


Query: 


241 


YVGDYALVNINKTTKKAASSLKESDVWKNLPAVKKGHI IESNYDVFYFSDPLSLEAQLKSF 3 0 - 






+ GDY +++ N +A +S +E++ ++++PAV+ G + E+N FYF+DPLSLE QL+ F 


Sb j ct : 


244 


FAGDYIILSKN DEADNSFQETSJTYQSIPAVQNGQVFEANAKEFYFNDPLSLELQIiEFF 301 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3693> which encodes the amino acid 
sequence <SEQ ID 3694>. Analysis of this protein sequence reveals the following: 

Possible site: 19 



>» May be a lipoprotein 



Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

5 >GP:BAB07609 GB:AP001520 ferrichrome ABC transporter 

(ferrichrome-binding protein) [Bacillus halodurans] 
Identities = 112/306 (36%) , Positives = 178/306 (57%) , Gaps = 3/306 (0%) 

Query: 2 KKLTLLLTLCLTTITLIACGNQATNHSNTASKSLSPMPQIAGVTYYGDIPKQPKRWSLA 61 
10 K L LL L + + ACG+ +S M T ++P P+RW+L 

Sbjct: 5 KHIiLLLTMLLFALLWAACGSNTDAEQADELESEDGMITYESETGPIEVPANPQRWALG 64 

Query: 62 STYTGYLKKlDI^LVGVTSYDKKNPIIiAKTVKKAKQVAATDLEAVTTLKPDIilVVGSTEE 121 
+TG + LD+N+VGV ++ K NP + ++ +V+ +LE + L PDLI+ ST + 
15 Sbjct: 65 --FTGNILALDVNWGVDTWSKNNPNYEQLLQDVTEVSEENLEQIMELDPDLIIAYSTVQ 122 

Query: 122 NIKQLAEIAPVISIEYRKRDYIiQVLSDFGRIFNKEDKAKKWLKDWKTKTAAYEPCEVKAVT 181 

N +QL EIAP + Y DYL+ + G++ NKE++A+ W+ D+K + +E+K 
Sbjct: 123 NAEQLQEIAPTVLYTYNNLDYLEQHVEIGKLLNKEEEAQAWVDDFKARAEQAGEEIKEKI 182 

20 

Query: 182 GDKATFTIMGLYEKDVYLFGKDWGRGGEIIHQAFHYDAPEKVKTEVFKQGYLSLSQEVLP 241 

G+ AT +++ +E +Y+FG +WGRG EI++Q PE+V+ GY +LS E LP 

Sbjct: 183 GEDATVSVIETFEDQLYVFGNNWGRGTEILYQTMDLAMPERVEEMALADGYYALSFEALP 242 

25 Query: 242 DYIGDYVWAAEDDKTGSALYESKLWQSIPAVKKHHVIKVNANVFYFTDPLSLEYQLETL 301 

++ GDY+++ +++D+ ++ E+ +QSIPAV+ V + NA FYF DPLSLE QLE 
Sbjct: 243 EFAGDYIIL-SKNDEADNSFQETNTYQSIPAVQNGQVFEANAKEFYFNDPLSLELQLEFF 301 

Query: 302 REAILS 307 
30 +E LS 

Sbjct: 302 KEHFLS 307 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 140/316 (44%) , Positives = 212/316 (66%) , Gaps = 12/316 (3%) 

35 

Query: 1 MKKIGIIV-LTLLTFFLVSCGQQTKQESTKTT--ISKMPKIEGFTYYGKIPENPKKVINF 57 

MKK+ +++ L L T L++CG Q S + +S MP+I G TYYG IP+ PK+V++ 
Sbjct: 1 MKKLTLLLTtiCLTTITLIACGNQATNHSNTASKSLSPMPQIAGVTYYGDIPKQPKRWSL 60 

40 Query: 58 TYSYTGYLLKLGVN---VSSYSLDLEKDSPVFGKQLKEAKKLTADDTEAIAAQKPDLIMV 114 
+YTGYL KL +N V+SY +K +P+ K +K+AK++ A D EA+ KPDLI+V 
Sbjct: 61 ASTYTGYLKKLDMNLVGVTSY DKKNPILAKTVKKAKQVAATDLEAVTTLKPDLIW 116 

Query: 115 FDQDPNINTLKKIAPTLVIKYGAQNYLDMMPALGKVFGKEKEANQWVSQWKTKTLAVKKD 174 
45 + NI L +IAP + I+Y ++YL ++ G++F KE +A +W+ WKTKT A +K+ 

Sbjct: 117 GSTEENI KQLAE IAP VI S IEYRKRDYLQVXjSDFGRI FNKEDKAKKWLKDWKTKTAAYEKE 176 

Query: 175 LHHILKPNTTFTIMDFYDKNIYLYGNNFGRGGELIYDSLGYAAPEKVKKDVFKKGWFTVS 234 
+ + TFTIM Y+K++YL+G ++GRGGE+I+ + Y APEKVK +VFK+G+ ++S 

50 Sbjct: 177 VKA.VTGDKATFTIMGLYEKDVYLFGKDWGRGGEIIHQAFHYDAPEKVKTEVFKQGYLSLS 236 

Query: 235 QFAIGDYVGDYALWINKTTKKAASSLKESDVWKNLPAVKKGHIIESNYDVFYFSDPLSL 294 

QE + DY+GDY +V K S+L ES +W+++PAVKK H+I+ N +VFYF+DPLSL 

Sbjct: 237 QEVLPDYIGDYVWAAE--DDKTGSALYESKLWQSIPAVKKHHVIKVNANVFYFTDPLSL 294 



55 



Query: 295 EAQLKSFTKAIKENTN 310 

E QL++ +AI + N 
Sbjct: 295 EYQLETLREAILSSEN 310 



60 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1187 

A DNA sequence (GBSxl263) was identified in S.agalactiae <SEQ ID 3695> which encodes the amino 
acid sequence <SEQ ID 3696>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have no N-terminal signal sequence 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1188 

A DNA sequence (GBSxl264) was identified in S.agalactiae <SEQ ID 3697> which encodes the amino 

acid sequence <SEQ ID 3698>. This protein is predicted to be ferrichrome transport permease (permease). 

Analysis of this protein sequence reveals the following: 

Possible site: 39 

»> May be a lipoprotein 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF98154 GB:AF251216 FhuB [Staphylococcus aureus] 
Identities = 116/313 (37%) , Positives = 194/313 (61%) , Gaps = 3/313 (0%) 

Query: 26 ILFLIGCYASLRFGAINFKTSDLITVLKNPLKNSNAQDVIFDIRLPRIIAAILVGAAMSQ 85 

++ LI + S G + S +1 + N ++ Q++I +IR+PR IAA++VG A++ 
Sbjct: 28 MILLITLFISTLIGDAKIQASTIIEAIFNYNPSNQQQNIINEIRIPRNIAAVIVGMALAV 87 

Query: 86 AGAIMQGvTRNAIADPGLLGINAGAGLALWAYAFLGSMHYSTILIVCLLGSVISCLLVF 145 

+GAI+QGVTRN +ADP L+G+N+GA AL + YA L + + ++ LG+++ +V 
Sbjct: 88 SGAIIQGWRNGLADPALIGLNSGASFAIuALTYAVLPNTSFLILMFAGFLGAILGGAIVL 147 

Query: 146 TLSYTKQKGYHQLRLILAGAMISTLFTSVGQVVTLYFKLNRTVIGWQAGGLSQINWKMLI 205 

+ +++ G++ +R+IIAGA +S + T++ Q + L F+LN+TV W AGG+S W L 
Sbjct: 148 MIGRSRRDGFNPMRIILAGAAVSAMLTALSCGIAIAFRIiNQTVTFWTAGGVSGTTWSHLK 207 



Final Results 



bacterial cytoplasm Certainty=0. 3431 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0 . 6095 (Affirmative) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 



Query: 206 IIAPIIILGLLISQLLAHQLTILSLNESVAKALGQKTQLMTAFLLLIVLFLSASSVALIG 265 

P+I + L I ++ QLTIL+L ES+AK LGQ ++ L+I + L+ +VA+ G 
Sbjct: 208 WAIPLIGIALFIILTISKQLTILNLGESIiAKGLGQNVTMIRGICLIIAMILAGIAVAIAG 267 



55 
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Query: 266 TVSFIGLIIPHFIKLFIPKDYRLLLPLIGFSGATFMIWVDLSSRIINPPSETSISSIISI 325 

V+F+GL++PH + I DY +LPL G ++ D+ +R + E + +IIS 
Sbjct: 268 QVAFVGLMVPHIARFLIGTDYAKILPLTALLGGILVLVADVIARYL GEAPVGAIISF 324 

5 Query: 326 VGLPCFLWLIRKG 338 

+G+P FL+L++KG 
Sbjct: 325 IGVPYFLYLVKKG 337 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3699> which encodes the amino acid 
10 sequence <SEQ ID 3700>. Analysis of this protein sequence reveals the following: 

Possible site: 54 
>>> Seems to have no N-terminal signal sequence 

15 



20 Final Results 

bacterial membrane Certainty=0. 5437 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the databases: 

>GP:AAF98154 GB:AF251216 FhuB [Staphylococcus aureus] 
Identities = 99/274 (36%) , Positives = 159/274 (57%) , Gaps = 1/274 (0%) 

Query: 34 LSFSLCTAIYCMIiRFGAVMiSHCJDIjNSILFG-KQNGHKftNVLIiAIRLPRLFGATLTGSAL 92 
30 LS L + ++ G + + +F + + N++ IR+PR A + G AL 

Sbjct: 26 LSMILLITLFISTLIGDAKIQASTIIEAIFNYNPSNQQQNIINEIRIPRNIAAVIVGMAL 85 

Query: 93 AVSGTIMQAITRNPIAEPGLLGINAGMLALVLAYAFVPHLHYSLIILLSLLGSSLAATL 152 
AVSG I+Q +TRN +A+P L+G+N+GA AL L YA +P+ + +++ LG+ L + 
35 Sbjct: 86 AVSGAIIQGOTRNGLADPALIGLNSGASFALALTYAVLPNTSFLILMFAGFLGAILGGAI 145 

Query: 153 VFGLSYQSGKGYHQLRLVIiAGAWSILLSALGQGITNYYHLANAVIGWQAGGLVGVNWQM 212 

V + G++ +R++LAGA VS +L+AL QGI + L V W AGG+ G W 

Sbjct: 146 VLMIGRSRmDGFNPmilLAGAAVSAMLTALSCGIAIAFRLNQTVTFWTAGGVSGTTWSH 205 

40 

Query: 213 IGYIAPLIILSLCLAQLLSYHLTVLSLSESCAIQUjGQKTNLISAVFMILVLILSSAAVAl 272 

+ + PLI ++L + +S LT+L+L ES AK LGQ +1 + +1+ +IL+ AVAI 
Sbjct: 206 LKWAIPLIGIALFIILTISKQLTILNLGESLAKGLGQNVTMIRGICLIIAMILAGIAVAI 265 

45 Query: 273 AGS I S F IGLVI PHLMKHFTPHHYRYLLPLCAVSG 306 

AG ++F+GL++PH+ + Y +LPL A+ G 

Sbjct: 266 AGQVAFVGLMVPHIARFLIGTDYAKILPLTALLG 299 

An alignment of the GAS and GBS proteins is shown below. 

50 Identities = 158/295 (53%), Positives = 214/295 (71%), Gaps = 1/295 (0%) 

Query: 6 KKLVQKNKSNHFWLVFFITLILFLIGCYASLRFGAINFKTSDLITVLKNPLKNSNAQDVI 65 

KK KS+ FWLVF + + Y LRFGA+ DL ++L +N + +V+ 

Sbjct: 16 KKTQIITKSHIFWLVFVLLSFSLCTAIYCHLRFGAVALSHQDLNSILFGK-QNGHKANVL 74 



Query: 66 FDIRLPRIIAAILVGAAMSQAGAIMQGVTRNAIADPGLLGINAGAGLALVVAYAFLGSMH 125 

IRLPR+ A L G+A++ +G IMQ +TRN IA+PGLLGINAGAGLALV+AYAF+ +H 
Sbjct: 75 l^IRLPRLFGATLTGSALAVSGTIMQAITRNPIAEPGLIiGINAGAGIiALVLAYAFVPHLH 134 



60 Query: 126 YSTILIVCLLGSVISCLLVFTLSYTKQKGYHQLRLILAGAMISTLFTSVGQVVTLYFKLN 185 

YS I+++ LLGS ++ LVF LSY KGYHQLRL+LAGAM+S L +++GQ +T Y+ L 
Sbjct: 135 YSLIILLSLLGSSIJ^TLVFGLSYQSGKGYHQLRLVIAGAIWSILLSALGC^ITNYYHLA 194 



Query: 186 RTVIGWQAGGLSQINWKMLI I IAPI I ILGLLISQLLAHQLTILSLNESVAKALGQKTQLM 245 
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VIGWQAGGL +NW+M+ IAP+IIL L ++QLL++ LT+LSL+ES AKALGQKT L+ 
Sbjct: 195 NAVIGWQAGGLVGV]MQMIGYIAPLIILSLCIAQLLSyHLTVLSLSESQAKALGQKTNLI 254 

Query: 246 TAFLLLIVLFLSASSVALIGTVSFIGLIIPHFIKLFIPKDYRLLLPLIGFSGATF 300 
5 +A +++VL LS+++VA+ G++SFIGL+IPH +K F P YR LLPL SGA+F 

Sbjct: 255 SAVFMILVLILSSAAVAIAGSISFIGLVIPHLMKHFTPHHYRYLLPLCAVSGASF 309 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

10 Example 1189 

A DNA sequence (GBSxl265) was identified in S.agalactiae <SEQ ID 3701> which encodes the amino 
acid sequence <SEQ ID 3702>. Analysis of this protein sequence reveals the following: 
Possible site: 13 

»> Seems to have no N-terminal signal sequence 

15 

Final Results 

bacterial cytoplasm Certainty=0 . 1492 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

25 Example 1190 

A DNA sequence (GBSxl266) was identified in S.agalactiae <SEQ ID 3703> which encodes the amino 
acid sequence <SEQ ID 3704>. This protein is predicted to be ferrichrome transport permease (permease). 
Analysis of this protein sequence reveals the following: 

Possible site: 30 
30 >>> Seems to have a cleavable N-term signal seq. 
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40 Final Results 

bacterial membrane Certainty=0 . 5140 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF98155 GB:AF251216 FhuG [Staphylococcus aureus] 
Identities = 122/334 (36%) , Positives = 208/334 (61%) , Gaps = 3/334 (0%) 

Query: 1 MIQKNKAPFVLISSVIILLLLILV SISLGYANTSVIDVLKLISGKSDDAFLFIITNI 57 

50 MI N LI+ + +LL L SI+ GNV K + G+D I+ + 

Sbjct: 1 MISSNNKRRQLIALAVFSILLFLGCTWSITSGEYNIPVERFFKTLIGQGDAIDELILLDF 60 



Query: 58 RLPRIIVCIFGGASLGIAGLLLQTLTKNPLADSGILGINAGAGLVIALTIGTFNVSNPTI 117 
RLPR+++ I GA+L I+G ++Q++TKNP+A+ GILGINAG G IAL I ++ 
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65 There is also homology to SEQ ID 396. 
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Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1191 

A DNA sequence (GBSxl267) was identified in S.agalactiae <SEQ ID 3705> which encodes the amino 
acid sequence <SEQ ID 3706>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 3785 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC05779 GB:AF0513S6 unknown [Streptococcus mutans] 
Identities = 49/93 (52%) , Positives = 63/93 (67%) 

Query: 1 MILTFNPGKLERQEFFKELINYLWIHDDVTLRKIKSHFTDYSKIDRLLEEYINHGYILRQ 60 

MI +N KL RQ FF +LINYL IHDDVTLR+IK +F D ++R +E+Y+ GY+LR+ 
Sbjct: 1 MIKIYNGDKLTRQPFFIKLINYLQIHDDVTLRQIKRNFADTEHLERSIEDYVQAGYVLRE 60 

Query: 61 NKRYSLNLPFLSSLDGLVLDDLVFIDSDSQIYQ 93 

NK Y L +LDGL LD +F+D S IYQ 

Sbjct: 61 NKHYYNAFELLENLDGLTLDSQIFVDDQSSIYQ 93 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3707> which encodes the amino acid 
sequence <SEQ ID 3708>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3447 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 108/212 (50%) , Positives = 143/212 (66%) 

Query: 1 MILTFNPGKLERQEFFKELINYLWIHDDVTLRKIKSHFTDYSKIDRLLEEYINHGYILRQ 60 

MI F+ KL RQ FF++LINYL HD V LR+IK F + + ID+ +E Y+ GYI R+ 
Sbjct: 1 MITVFHSDKLTRQPFFQDLINYLDQHDHVILREIKKAFPNVTGIDKAIESYVQAGYIRRE 60 

Query: 61 NKRYSLNLPFLSSLDGLVLDDLVFIDSDSQIYQLLQKRKFVTNLDNPTNHLVFVEETDFE 120 

NKRY +NLP +SS L LD ++F+D+ S +Y+ + F T L N TN ++ E+T+ 
Sbjct: 61 NKRYGINLPLVSSDQQLALDTMLFVDTCSAMYENILAWFETQLTNQTNRVMIKEKTNIT 120 

Query: 121 RNTLTLSNYFYKLTNGYPLSREQKKLYQLLGDVNSEYALKYMSSFILKFLRKDSVKQKRT 180 

R+ LTL+NYFY+L G S EQ LY LLGDVN EYALKYM++F+LKF RKD V QKR 
Sbjct: 121 RDDLTLANYFYRLKRGEKPSAEQMDLYDLLGDVNQEYALKYMTTFLLKFTRKDFVMQKRP 180 

Query: 181 VIFIQALELLGYISLNQDTTYRLNAKLDVEAL 212 

IF++AL LGY+ + TTY+L LD E+L 
Sbjct: 181 DIFVEALVTLGYLKQVEPTTYQLLMTLDKESL 212 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1192 

A DNA sequence (GBSxl268) was identified in S.agalactiae <SEQ ID 3709> which encodes the amino 
acid sequence <SEQ ID 3710>. Analysis of this protein sequence reveals the following: 

Possible site: 24 
5 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0824 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB39104 GB:U57759 intrageneric coaggregation-relevant adhesin 
[Streptococcus gordonii] 
15 Identities = 261/311 (83%) , Positives = 283/311 (90%) 

Query: 1 MSKILVFGHQNPDSDAIGSSVAFAYLAKEAWGLDTEAVALGTPNEETAYVLDYFGVQAPR 60 

MSKILVFGHQNPDSDAIGSS AFAYLA+EA+GLDTFAVALG PNEETA+VLDYFGV APR 
Sbjct: 1 MSKIIiVFGHQNPDSDAIGSSYAFAYLAREAYGLDTEAVALGEPNEETAFVLDYFGVAAPR 60 

20 

Query: 61 VVESAKAEGVETVILTDHNEFQQSISDIKDVWYGVVDHHRVANFETANPLYMRLEPVGS 120 

V+ SAKAEG E VILTDHNEFQQS++DI +V VYGWDHHRVANFETANPLYMRLEPVGS 
Sbjct: 61 VITSAKAEGAEQVILTDHNEFQQSVADIAEVEVYGWDHHRVANFETANPLYMRLEPVGS 120 

25 Query: 121 ASSIVYRMFKENGVSVPKELAGLLLSGLISDTLLLKSPTTHASDIPVAKELAEIAGVNLE 180 

ASS I VYRMFKE+ V+V KE+AGL+LSGLISDTLLLKSPTTH +D +A ELAELAGVNLE 
Sbjct: 121 ASSIVYRMFKEHSVAVSKEIAGLMLSGLISDTLLLKSPTTHPTDKAIAPELAELAGVNLE 180 

Query: 181 EYGLEMLKAGTNLSSKTAAELIDIDAKTFELNGEAWVAQVNTVDINDIIARQEEIEVAI 240 
30 EYGL MLKAGTNL+SK+A ELIDIDAKTFELNG WVAQVNTVDI ++L RQ EIE AI 

Sbjct: 181 EYGLAMLKAGTNIASKSAEELIDIDAKTFEI^GNNTOVAQ vWTA/DIAEVLERQAE I EAAI 240 

Query: 241 QEAIVTEGYSDFVLMITDIWSNSEI1ALGSNMAKVEAAFEFTLENNHAFLAGAVSRKKQ 300 
++AI GYSDFVLMITDI+NSNSEIliA+GSNM KVEAAF F LENNHAFLAGAVSRKKQ 
35 Sbjct: 241 EKAIADNGYSDFVLMITDIINSNSEIl^IGSNMDKVEAAFNFvLENNHAFLAGAVSRKKQ 300 

Query: 301 WPQLTESYNA 311 

WPQLTES+NA 
Sbjct: 301 WPQLTES FNA 311 

40 

A related DNA sequence was identified in S.pyogenes <SEQ ID 371 1> which encodes the amino acid 
sequence <SEQ ID 3712>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>>> Seems to have no N-terminal signal sequence 
45 INTEGRAL Likelihood = -2.02 Transmembrane 141 - 157 ( 141 - 157) 

Final Results 

bacterial membrane Certainty=0 . 1808 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



A related sequence was also identified in GAS <SEQ ID 9103> which encodes the amino acid sequence 
<SEQ ID 9104>. Analysis of this protein sequence reveals the following: 

Possible site: 50 
55 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.02 Transmembrane 139 - 155 ( 139 - 155) 

Final Results 

bacterial membrane Certainty= 0. 181 (Affirmative) < suco 

60 bacterial outside Certainty^ 0.000 (Not Clear) < suco 

bacterial cytoplasm Certainty= 0.000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 253/311 (81%) , Positives = 283/311 (90%) 

5 Query: 1 MSKILVFGHQNPDSDAIGSSVAFAYLAKEAWGLDTEAvALGTPNEETAYVLDYFGVQAPR 60 

MSKILVFGHQNPD+DAI SS AF YL+++A+GLDTE VALGTPNEETA+ LDYFGV+APR 
Sbjct: 3 MSKILVFGHQNPDTDAIASSYAFDYLSQKAFGLDTEWALGTPNEETAFALDYFGVEAPR 62 

Query: 61 WESAKAEG VETVI LTDHNEFQQS I SD I KDVTVYGWDHHRVANFETANPLYMRLEPVGS 120 
10 WESAKA+G E VILTDHNEFQQSI+DI++V VYGWDHHRVAMFETANPLYMR+EPVGS 

Sbjct: 63 WESAKAQGSEQVILTDHNEFQQSIADIREvEVYGWDHHRVANFETANPLYMRVEPVGS 122 

Query: 121 ASS I VYRMFKENGVSVPKELAGLLLSGLI SDTLLLKS PTTHASDI PVAKELAELAGVNLE 180 
ASSIVYRMFKENG+ VPK +AG+LLSGLISDTLI1LKSPTTH SD VA+ELAELA VNLE 
15 Sbjct: 123 ASSIWRMFKENGIEVPKAIAGMLLSGLISDTLLLKSPTTHVSDHLVAEELAELAEVNLE 182 

Query: 181 EYGLEMLKAGTOLSSKTAAELIDIDAKTFELNGEAVRVAQVNTVDINDILARQEEIEVAI 240 

+YG+ +LKAGTNL+SK+ ELI IDAKTFELNG AVRVAQVNTVDI ++L RQE IE AI 
Sbjct: 183 DYGMALLKAGTNLASKSEvELIGIDAKTFELMGNATOVAQWTVDIAEVLERQEAIEAAI 242 

20 

Query: 241 QEAIOTEGYSDFvlMITDIWSNSEILALGSNMAKVEAAFEFTLENNHAFLAGAVSRKKQ 300 

++A+ EGYSDFVLMITDIVNSNSEILA+G+NM KVEAAF FTL+NNHAFLAGAVSRKKQ 
Sbjct: 243 KDAMAAEGYSDFVLMITDIWSNSEIIAIGAmDK^mAAFNFTLDNNHAFLAGAVSRKKQ 302 

25 Query: 301 WPQLTESYNA 311 

WPQLTES+ A 
Sbjct: 303 WPQLTESFGA 313 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
30 vaccines or diagnostics. 

Example 1193 

A DNA sequence (GBSxl269) was identified in S.agalactiae <SEQ ID 3713> which encodes the amino 

acid sequence <SEQ ID 3714>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
35 >» Seems to have no N-terrainal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 276 9 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC05773 GB:AF051356 pyruvate -formate lyase activating enzyme 
[Streptococcus mutans] 
45 Identities = 184/260 (70%), Positives = 217/260 (82%) 

Query: 3 EIDYKKVTGMIHSTESFGSVDGPGIRFIIFMQGCK^CQYCHNPDTWEMETNNSKERTVE 62 

++DY+KVTG+++STESFGSVDGPGIRF++FMQGC+MRCQYCHNPDTW M+ + + ERT 
Sbjct: 4 KVDYEKVTGLVNSTESFGSVDGPGIRFVVFMQGCQMRCQYCHNPDTWAMKNDRATERTAG 63 



50 



Query: 63 DVLKFALRYKHFWGKDGGITVSGGEAMLQIDFITALFIEAKKLGIHTTLDTCGFAYRATP 122 

DV KEALR+K FWG GGITVSGGEA LQ+DF+ ALF AK+ GIHTTLDTC +R TP 
Sbjct: 64 DVFKEALRFKDFWGDTGGITVSGGEATLQMDFLIALFSLAKEKGIHTTLDTCALTFRNTP 123 



55 Query: 123 EYHAILEKLLDOTDLvLLDLKEIDSEQHKIVTRQSNKNILQFARYLSDRGTPVWIRHVLV 182 

+Y EKL+ VTDLVLLD+KEI+ +QHKIVT SNK IL ARYLSD G PVWIRHVLV 
Sbjct: 124 KYLEKYEKLmVTDLVLLDIKEINPDQHKIOTGHSNKTIIACARYLSDIGKPVWIRHVLV 183 



60 



Query: 183 PGLTDIDDHLKRLGEFVQTLDNVDKFEvLPYHTMGEFKWRELGIPYPLAGVKPPTPERVK 242 

PGLTD D+ L +LGE+V+TL NV +FE+LPYHTMGEFKWRELGIPYPL GVKPPTP+RV+ 
Sbjct: 184 PGLTDRDEDLIKLGEWKTLKNVQRFEILPYHTMGEFKWRELGIPYPLEGVKPPTPDRVR 243 
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Query: 243 NAKDIMKTESYTEYLKRIQN 262 

NAK +M TE+Y EY KRI + 
Sbjct: 244 NAKKLMHTETYEEYKKRINH 263 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3715> which encodes the amino acid 
sequence <SEQ ID 3716>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

»> Seems to have no N- terminal signal sequence 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 223/260 (85%) , Positives = 239/260 (91%) 



Query: 


1 


MAEIDYKKVTGMIHSTESFGS VDGPGIRFI I FMCGCKMRCQYCHNPDTWEMETNNSKERT 


60 






M E DY +VTGM+HSTESFGSVDGPGIRFIIF+QGCK+RCQYCHNPDTWEMETNNSK RT 




Sb j ct : 


25 


MTEKDYGQVTGMVHSTESFGSVDGPGIRFIIFLQGCKLRCQYCHNPDTWEMETNNSKIRT 


84 


Query: 


61 


VEDVLKEALRYKHFWGKDGGITVSGGEAMLQIDFITALFIEAKKLGIHTTLDTCGFAYRA 


120 






V DVLKEAL+YKHFWGK GGITVSGGEAMLQIDFITALFIEAKKLGIHTTLDTCGF YR 




Sbjct: 


85 


Vl^VLKEALQYKHFWGKKGGITVSGGEAMLQIDFITALFIEAKKLGIHTTLDTCGFTYRP 


144 


Query: 


121 


TPEYHAILEKLLDVTDLVLLDLKEIDSEQHKIVTRQSNKNILQFARYLSDRGTPVWIRHV 


180 






TPEYH +L+ LL VTDL+LLDLKEID +QHKIVTRQ NKNILQFARYLSD+ PVWIRHV 




Sb j ct : 


145 


TPEYHQVLDNLLAOTDLILLDLKEIDEKQJIKIVTRQPNKNILQFARYLSDKQIPvWIRHV 


204 


Query: 


181 


LVPGLTDIDDHLKRLGEFVQTLDNvDKFEVLPYHTMGEFKWRELGIPYPLRGVKPPTPER 


240 






LVPGLTDIDDHL RLGEFV+TL NVDKFEVLPYHTMGEFKWRELGIPY L GVKPPT ER 




Sb j ct : 


205 


LVPGLTDIDDHLTRLGEFVKTLKNVDKFEVLPYHTMGEFKWRELGIPYQLEGVKPPTKER 


264 


Query: 


241 


VKNAKDIMKTESYTEYLKRI 260 








V+NAK++M+TESYTEY+ RI 




Sb j ct : 


265 


VQNAKNLMQTESYTEYMNRI 284 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1194 

A DNA sequence (GBSxl270) was identified in S.agalactiae <SEQ ID 3717> which encodes the amino 
acid sequence <SEQ ID 371 8>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -7.06 Transmembrane 105 - 121 ( 103 - 126) 
INTEGRAL Likelihood = -5.57 Transmembrane 137 - 153 ( 136 - 162) 



Final Results 



bacterial cytoplasm Certainty=0 .4614 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 3824 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC05772 GB:AF051356 putative hemolysin [Streptococcus mutans] 
Identities = 347/445 (77%) , Positives = 406/445 (90%) , Gaps = 1/445 (0%) 



Query: 1 MQDPGSQSLLLQFVILLILTLFNAFFSASEMALVSLNRSKVEQKAEEGDKRYRRLLDVLE 60 
M+DPGSQSL+LQF++LLILTL NAFFSA+EMALVSLNR++VEQKAEEG+K+Y RLL VLE 
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Sbjct: 1 MEDPGSQSLILQFLLLLILTLCMAFFSATEMALVSUsn^VEQKAEEGEKK^IRLLKVLE 60 

Query: 61 NPNNFLSTIQVGITFI SLLQGASLSASLGHVISGWLGNSATARTAGS I IALI FLTYVSIV 120 

NPNNFLSTIQVGIT I+LL GASL+ SLG 1+ W GNSATARTAGS+I+L FLTY+SIV 
Sbjct: 61 NPNNFLSTIQVGITLITLLSGASLADSLGREIAVWFGNSATARTAGSLISLAFLTYISIV 120 

Query: 121 LGELYPKRIAMNLKDRLAIVSAPI 1 1 FLGKIVSPFVWLLSASTNLLSRITPMTFDDADEK 180 

LGELYPKRIAMNLK+ LA+ +SAP+ 1 1 FLGK+VSPFVWLLS STNLLSR+TPMTFDDADEK 
Sbjct: 121 LGELYPKRIAMNLKENLAVLSAPVI I FLGKVVSPFVWLLSVSTNLLSRLTPMTFDDADEK 180 

Query: 181 MTRDEIEYMLTNSEETLEAEEIEMLQGIFSLDEMMAREVMVPRTDAFMIDINNDAQSNIE 240 

MTRDEIEYMLTNSEETL+A+EIEMLQG+FSLDE+MAREVMVPRTDAFM+DIN+D+ 1+ 
Sbjct: 181 MTRDEIEYMLTNSEETLDADEIEMLQGVFSLDELMAREVMVPRTDAFMVDINDDSSDIIQ 240 

15 Query: 241 GILSQNFSRVPVFDDDKDRWGVLHTKRLLEAGFKTGFDTIDLRKILQEPLFVPETIFVD 300 

IL++ FSR+PV+DDDKD+++G++HTK LL AGFK GFD I+LR+ILQEPLFVPETI V+ 
Sbjct: 241 TILNERFSRIPVYDDDKDKIIGIIHTKNLLNAGFKEGFDHINLRRILQEPLFVPETIWN 300 

Query: 301 DLLKALRNTQNQMAI LLDE YGGVAGLVTLEDLLEEIVGE I DDETDTAEQFVRE IDENI YI 360 
20 DLL AL+NTQNQMAILLDEYGGVAGLVTLEDLLEEIVGEIDDETD VREI +N YI 

Sbjct: 301 DLLTALKNTQNQMAILLDEYGGVAGLVTLEDLLEEIVGEIDDETDKTAISVREIADNTYI 360 

Query: 361 VLGTMTMEFNDYFETELESDDVDTIAGYYLTGVGSIPNQEEKVAYEVDSKDKHITLIND 420 
VLGTMTLN+FN+YFET+LESD+VDTIAG+YLTGVG+IP+QEEK +EV+S KH+ LIND 
25 Sbjct: 361 VLGTMTLNDFNEYFETDLESDNUDTIAGFYLTGVGTIPSQEEKEHFEVESNGKHLELIND 420 

Query: 421 KVKDGRITKLKVLLSDIEQ-NIEKD 444 

KVKDGR+TKLK+L+S++E+ EKD 
Sbjct: 421 KVKDGRVTKLKILVSEVEEKEDEKD 445 

30 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3719> which encodes the amino acid 
sequence <SEQ ID 3720>. Analysis of this protein sequence reveals the following: 
Possible site: 42 

35 >>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.76 Transmembrane 22 - 38 ( 16 - 47) 

INTEGRAL Likelihood = -5.57 Transmembrane 118 - 134 ( 117 - 138) 

INTEGRAL Likelihood = -3.19 Transmembrane 150 - 166 ( 149 - 169) 

40 Final Results 

bacterial membrane Certainty=0 . 4503 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the databases: 

>GP:AAC05772 GB:AF051356 putative hemolysin [Streptococcus mutans] 
Identities = 343/443 (77%) , Positives = 401/443 (90%) 

Query: 14 MEDPVSQSLVIQFLLLWLTLLNAFFSASEMALVSLNRSRVEQKAADGDKKYARLLRVLE 73 
50 MEDP SQSL++QFLLL++LTL NAFFSA+EMALVSLNR+RVEQKA +G+KKY RLL+VLE 

Sbjct: 1 MEDPGSQSLILQFLLLLILTLCNAFFSATEMALVSLNRARVEQKAEEGEKKYIRLLKVLE 60 

Query: 74 EPNHFLSTIQVGITFISLLSGASLSASLGKVISGWLGNSATARTAGTIISLVFLTYVSIV 133 
PN+FLSTIQVGIT I+LLSGASL+ SLG+ 1+ W GNSATARTAG++ISL FLTY+SIV 
55 Sbjct: 61 NPNNFLSTIQVGITLITLLSGASLADSLGREIAVWFGNSATARTAGSLISLAFLTYISIV 120 

Query: 134 LGELYPKRIAMNLKDKLAIVSAPI I IGLGRLVSPFVWLLSASTNLLSRLTPMTFDDADEQ 193 

LGELYPKRIAMNLK+ LA++SAP+II LG++VSPFVWLLS STNLLSRLTPMTFDDADE+ 
Sbjct: 121 LGELYPKRIAMNLKENLAVLSAPVI IFLGKVVSPFVWLLSVSTNLLSRLTPMTFDDADEK 180 

60 

Query: 194 MTRDEIEYMLSKSEATLDAEEIEMLQGVFSLDEMMAREVMVPRTDAFMIDINDDPLENIQ 253 

MTRDEIEYML+ SE TLDA+EIEMLQGVFSLDE+MAREVMVPRTDAFM+DINDD + IQ 
Sbjct: 181 MTRDEIEYMLTNSEETLDADEIEMLCjGVFSLDELMAREVMVPRTDAFMVDINDDSSDIIQ 240 

65 Query: 254 EILKQSFSRIPVYDVDKDKIIGLIHTKRLLESGFRQGFDQINMRKMLQEPLFVPETIFvD 313 
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IL + FSRIPVYD DKDKIIG+IHTK LL +GF++GFD IN+R++LQEPLFVPETI V+ 
Sbjct: 241 TII^RFSRIPWDDDIOJKIIGIIHTKmLNAGFKEGFDHINLRRILQEPLFVPETIVVN 300 

Query: 314 DLIiRQLROTQNQ^ILLDEYGGVAGLVTLEDLLEEIVGEIDDETDKAEQFVHEIGDNTYI 373 
5 DLL L+NTQNQMAILLDEYGGVAGLVTLEDLLEEIVGEIDDETDK V EI DNTYI 

Sbjct: 301 DLLTALKNTQNQMAI LLDE YGGVAGLVTLEDLLEEI VGEIDDETDKTAI SVRE IADNTYI 360 

Query: 374 VVGTMTIjNEFNDYFDTELESDDVDTIAGFYLTGIGTIPSQEQKFAYEIDNKDKHLVLIND 433 
V+GTMTLN+FN+YF+T+LESD+VDTIAGFYLTG+GTIPSQE+KE +E+++ KHL LIND 
10 Sbjct: 361 VLGTMTLNDFNEYFETDLESDNVDTIAGFYLTGVGTIPSQEEKEHFEVESNGKHLELIND 420 

Query: 434 KVKDGRITKLKLILSNIEQI IEE 456 

KVKDGR+TKLK+++S +E+ +E 
Sbjct: 421 KVKDGRVTKLKI LVSEVEEKEDE 443 

15 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 364/444 (81%) , Positives = 417/444 (92%) 

Query: 1 MQDPGSQSLLLQFVILLILTLFNAFFSASEMALVSLNRSKVEQKAEEGDKRYRRLLDVLE 60 
20 M+DP SQSL++QF++L++LTL NAFFSASEMALVSLNRS +VEQKA +GDK+Y RLL VLE 

Sbjct: 14 MEDPVSQSLVIQFLLLWLTLLNAFFSASEMALVSLNRSRVEQKAADGDKKYARLLRVLE 73 

Query: 61 NPNNFLSTIQVGITFISLLQGASLSASLGHVISGWLGNSATARTAGSIIALIFLTYVSIV 120 
PN+FLSTIQVGITFISLL GASLSASLG VISGWLGNSATARTAG+II+L+FLTYVSIV 
25 Sbjct: 74 EPNHFLSTIQVGITFISLLSGASLSASLGKVISGWLGNSATARTAGTIISLVFLTYVSIV 133 

Query: 121 LGELYPKRIAMNLKDRLAIVSAPIIIFLGKIVSPFVWLLSASTNLLSRITPMTFDDADEK 180 

LGELYPKRIAMNLKD+LAIVSAPIII LG++VSPFVWLLSASTNLLSR+TPMTFDDADE+ 
Sbjct: 134 IjGELYPKRIAMNLKDKLAIVSAPIIIGLGRLVSPFVWLLSASTNLLSRLTPMTFDDADEQ 193 

30 

Query: 181 MTRDEIEYMLTNSEETLEAEEIEMLQGIFSLDE^#IAREVMVPRTDAFMIDINNI)fiQSNIE 240 

MTRDEIEYML+ SE TL+AEEIEMLQG+FSLDEMMAREVMVPRTDAFMIDIN+D NI+ 
Sbjct: 194 MTRDEIEYMLSKSEATLDAEEIEMLQGVFSLDEMMAREVMVPRTDAFMIDINDDPLENIQ 253 

35 Query: 241 GILSQNFSRVPVFDDDKDRWGVLHTKRLLEflGFKTGFDTIDLRKILQEPLFVPETIFVD 300 

IL Q+FSR+PV+D DKD+++G++HTKRLLE+GF+ GFD I ++RK+LQEPLFVPETI FVD 
Sbjct: 254 EILKQSFSRIPVYDVDKDKIIGLIHTKRLLESGFRQGFDQINMRKMLQEPLFVPETIFVD 313 

Query: 301 DLLKALRNTQNQMAILLDEYGGVAGLVTLEDLLEEIVGEIDDETDTAEQFVREIDENIYI 360 
40 DLL+ LRNTQNQMAILLDEYGGVAGLVTLEDLLEEIVGEIDDETD AEQFV EI +N YI 

Sbjct: 314 DLLRQLRNTQNQMAILLDEYGGVAGLVTLEDLLEEIVGEIDDETDKAEQFVHEIGDNTYI 373 

Query: 361 VLGTMTLNEFNDYFETELESDDVDTIAGYYLTGVGSIPNQEEKVAYEVDSKDKHITLIND 420 
V+GTMTLNEFNDYF+TELESDDVDTIAG+YLTG+G+IP+QE+K AYE+D+KDKH+ LIND 
45 Sbjct: 374 WGTMTLNEFNDYFDTELESDDVDTIAGFYLTGIGTIPSQEQKEAYEIDNKDKHLVLIND 433 

Query: 421 KVKDGRI TKLKVLLSD IEQNI EKD 444 

KVKDGRITKLK++LS+IEQ IE+D 
Sbjct: 434 KVKDGRITKLKLILSNIEQI IEED 457 

50 

SEQ ID 3718 (GBS70d) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 120 (lane 8-10; MW 65kDa). It was also expressed in E.coli as a His-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 120 (lane 11 & 12; MW 44kDa) and in 
Figure 179 (lane 5; MW 35kDa). 

55 GBS70d-His was purified as shown in Figure 231, lane 9-10. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1195 

A DNA sequence (GBSxl271) was identified in S.agalactiae <SEQ ID 3721> which encodes the amino 
acid sequence <SEQ ID 3722>. Analysis of this protein sequence reveals the following: 

Possible site: 46 
5 :>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1212 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB84230 GB:AL162754 hypothetical protein NMA0960 [Neisseria 
meningitidis Z2491] 

15 Identities = 80/184 (43%) , Positives = 119/184 (64%) , Gaps = 3/184 (1%) 

Query: 1 MIKRPIHLSHDFLAEVIDKEAITLDATMGNGNDTVFLAKSSK KVYAFDIQEEAIAKT 57 

++K + +H L + + + LD T GNG+DT+FLA+++ KV+AFDIQ +A+ T 
Sbjct: 2 LLKNILPFAHCLLRQALPEGGNALDGTAGNGHDTLFLAQTAGIRGKWAFDIQPQALNNT 61 

20 

Query: 58 KAKLTEQGI SNAEL I LDGHENLEQYVHTPLRAAI FNLGYLPSADKTVI TKPHTTI KAI KN 117 

+ +L E G SN LILDGHENL+QY+ PL AAIFN G+LP DK++ T+ T+I A+ 
Sbjct: 62 RCRLQEAGYSNVRLILDGHENLKQYIPKPLDAAIFNFGWLPGGDKSLTTRTETSIAALSA 121 

25 Query: 118 VLDILEVGGRLSLMVYYGHDGGKSEKDAVIAFVEQLPQNNFATMLYQPLNQVNTPPFLIM 177 

L +L+ G L ++Y GH+ GK E +A+ + + LPQ FA + Y N+ N+PP+L+ 
Sbjct: 122 ALSLLKENGMLIAvLYPGHENGKQFJ^IEQWAKNLPQEQFAVLRYSFTNRKNSPPYLLA 181 

Query: 178 VEKL 181 
30 EKL 

Sbjct: 182 FEKL 185 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3723> which encodes the amino acid 

sequence <SEQ ID 3724>. Analysis of this protein sequence reveals the following: 

35 Possible site: 55 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.44 Transmembrane 127 - 143 ( 123 - 143) 

Final Results 

40 bacterial membrane Certainty=0 . 1574 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9101> which encodes the amino acid sequence 
45 <SEQ ID 9102>. Analysis of this protein sequence reveals the following: 

Possible site: 46 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.44 Transmembrane 118 - 134 ( 114 - 134) 

50 Final Results 

bacterial membrane Certainty= 0 . 157 (Affirmative) < suco 

bacterial outside Certainty= 0.000 (Not Clear) < suco 

bacterial cytoplasm Certainty= 0.000 (Not Clear) < suco 

55 An alignment of the GAS and GBS proteins is shown below. 

Identities * 124/184 (67%) , Positives = 156/184 (84%) 



Query: 1 MIKRPIHLSHDFLAEVIDKEAITLDATMGNGNDTWLAKSSKKVYAFDIQEEAIAKTKAK 60 
M+KRPIHLSHDFLAEV+DK ++ +DATMGNGNDT FLA+ +KKVYAFD+QE+AI KT + 
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Sb j Ct : 


10 


MLKRPlHLSHDFLAEVVDKSSVWDATMGNGtTOTAFLAQLAKKVYAFDVQEQAIRKTSER 


69 


Query: 


61 


LTEQGISNAELILDGHENLEQYVHTPLRAAIFMLGYLPSADKTVITKPHTTIKAIKNVLD 


120 






T, x ftlj-C;KraT7T.TTi dWR 4.J.HW P4.R&ATFWT fTVT.pciAn'K'-»--»-TT P4-TT4--l-A-t- -4-T. 




Sbjct: 


70 


IAQLGLSNAELILAGHEAVDQYVTEPVRAAIFNLGYLPSADKSIITLPNTTLQALSKLLT 


129 


Query: 


121 


ILEVGGRLSL[WYYGHDGGKSEI<DAVIAFVEQLPQNOTATMLYQPLNQVNTPPFLIMVEK 


180 






+L VGGR+++MVYYGHDGG EKDA++ FV+QL Q + MLYQPLNQVNTPPFLIM+EK 




Sbjct: 


130 


LLMVGGRIAIMVYYGHDGGSLEKDALLDFVKQLDQRKVSAMLYQPIMQVNTPPFLIMLEK 


189 


Query: 


181 


LQSY 184 
L + 




Sbjct: 


190 


LADF 193 





15 Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1196 

A DNA sequence (GBSxl272) was identified in S.agalactiae <SEQ ID 3725> which encodes the amino 
acid sequence <SEQ ID 3726>. Analysis of this protein sequence reveals the following: 

20 Possible site: 51 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1948 (Affirmative) < suco 

25 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC00380 GB:AF008220 YtqA [Bacillus subtilis] 
30 Identities = 161/302 (53%) , Positives = 220/302 (72%) , Gaps = 4/302 (1%) 



35 



Query: 2 KKRYRAINDYYRELFGEKIFKLPIDAGFDCPNRDGTVARGGCTFCTVSGSGDAIVAPEAP 61 

+KRY +N + RE FG K+FK+ +D GFDCPNRDGTVA GGCTFC+ +GSGD 
Sbjct: 13 EKRYHTLNYHLREHFGHKVFKVALDGGFDCPNRDGTVAHGGCTFCSAAGSGDFAGNRTDD 72 

Query: 62 IREQFYKEIDFMHRKWPEVNKYLVYFQNFTNTHAKLEIIKERYEQAINEPGVIGINIGTR 121 

+ QF+ + MH KW + KY+ YFQ FTNTHA +E+++E++E + V+GI + I TR 
Sbjct: 73 LITQFHDIKNRMHEKWKD-GKY1AYFQAFTNTHAPVEVLREKFESVLALDDWGISIATR 131 

40 Query: 122 PDCLPDETIYYLAELSERMHVTLELGLQTTYEATSALINRAHSYDLYKKTVKRIRELAPK 181 

PDCLPD+ + YLAEL+ER ++ +ELGLQT +E T+ LINRAH ++ Y + V ++R+ 
Sbjct: 132 PDCLPDDVVDYrAELNERTYLWVELGLQTvHERTAIilNRAHDFNCYVEGVNKLRKHG-- 189 

Query: 182 vEIVSHLINGLPGETHDMMVENVRRCVTDNDIQGIKLHLLHLMTNTRMQRDYHEGRLRIjL 241 
45 + + SH+INGLP e DMM+E + V D D+QGIK+HLLHL+ t m + Y +G+L L 

Sbjct: 190 IRVCSHIINGLPLEDRDMMMETAK-AVADLDVQGIKIHLLHLLKGTPMVKQYEKGKLEFL 248 

Query: 242 SQEDYISIICDQLEIIPKHIVIHRITGDAPRHMLIGPMWSLNKWEVLNAIDKEMEKRQSY 301 
SQ+DY+ ++CDQLEIIP +++HRITGD P ++IGPMWS+NKWEVL AI+KE+E R SY 
50 Sbjct: 249 SQDDYVQLVCDQLE 1 1 PPEMIVHRI TGDGPIEIiMIGPMWS VNKWEVLGAINKELENRGS Y 308 

Query: 302 QG 303 
QG 

Sbjct: 309 QG 310 



55 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3727> which encodes the amino acid 
sequence <SEQ ID 3728>. Analysis of this protein sequence reveals the following: 



Possible site: 57 

»> Seems to have no N-terminal signal sequence 



60 
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Final Results 



bacterial cytoplasm Certainty=0. 2 023 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 260/307 (84%) , Positives = 290/307 (93%) , Gaps = 1/307 (0%) 



Query: 


1 


MKKRYRAINDYYRELFGEKIFKLPIDAGFDCPNRDGTVARGGCTFCTVSGSGDAIVAPEA 


60 






MKKRY+ +N++YR+LFG K+FK+PIDAGFDCPNRDGTVA GGCTFCTVSGSGDAIVAP+A 




q"h-i nt- • 

OJJJ OL . 


7 


I'lJVfuvX^ x J-U.N Cjil x i\.\^u r >jxij\i*ir is. v r iUiiur xji— Jri\r\.LJo i v riuuvj i_. x r x v ouoouni v furx^ri 


66 


Query: 


61 


PIREQFYKEIDF^RKWPEWKYL^^ 


120 






jr x Toyr x ivniur i*inr\. ivinj ir -r vxNTXXJvxr yx^ir inin TTTiTTiviCiyttJ-i'icirij v tuhv j-ui 




qK-i ,-.4- . 


D / 


xr x xVEL'yjjr x JXCiiiJr i v jni\.r\.vv iru vx\i\.x u v x jt r x in x nu i v xj v -Lr\.j_jr\. x ^ji^friXi-Ndirvj v vuiiviui 


126 


Query: 


121 


RPDCLPDETIYYLAELSERMHVTLELGLQTTYEATSALINRAHSYDLYKKTOKRIREIAP 


180 






RPDCLPD+TI YLAELSERMHVT+ELGLQTTYE TS LINRAHSYDLYK+TV+R+R P 




Sbjct: 


127 


RPDCLPDDTIAYI^LSERMHvWELGLQTTYEETSRLINRAHSYDLYKETVRRLRHY-P 


185 


Query: 


181 


KVEIVSHLINGLPGETHDIWVENvRRCTTDNDICGIKLHLLHL 


240 






+ IVSHLINGLP ETHDMM+ENVRRCVTDND I QG I KLHLLHLMTNTRMQRDYHEGRL+L 




Sbjct: 


186 


NINIVSHLINGLPKETHDMMLENvRRCVTDNDIQGIKLHLLHLMTNTRMQRDYHEGRLKL 


245 


Query: 


241 


LSQEDYI S 1 1 CDQLE1 IPKHIVIHRITGDAPRHMLlGPMWSLNKWEVTiNAIDKEMEKRQS 


300 






LSQ+DY+SI I CDQLEI IPKHI VIHRITGDAPR MLIGPMWStNKWEVLNAIDKEME+R S 




Sbjct: 


246 


LSQKDYVS 1 1 CDQLE 1 1 PKHI VI HRITGDAPRDMLIGPMWSLNKWEVLNAI DKEMERRGS 


305 


Query: 


301 


YQGCKAE 307 








+QGCK + 




Sbjct: 


306 


FQGCKVD 312 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1197 

A DNA sequence (GBSxl273) was identified in S.agalactiae <SEQ ID 3729> which encodes the amino 
acid sequence <SEQ ID 3730>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

»> Seems to have an uncleavable N-term signal seq 



INTEGRAL 


Likelihood 




-9, 


.82 


Transmembrane 


10 


- 26 


( 6 


- 30) 


INTEGRAL 


Likelihood 




-4. 


.73 


Transmembrane 


93 


- 109 


( 87 


- 112) 


INTEGRAL 


Likelihood 




-4. 


,57 


Transmembrane 


163 


- 179 


( 161 


- 181) 


INTEGRAL 


Likelihood 




-2, 


.97 


Transmembrane 


189 


- 205 


( 185 


- 205) 


INTEGRAL 


Likelihood 




-1. 


,97 


Transmembrane 


58 


- 74 


( 58 


- 74) 


INTEGRAL 


Likelihood 




-0 


.75 


Transmembrane 


130 


- 146 


( 130 


- 146) 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 4 92 7 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA79986 GB:Z21972 0RF2 [Bacillus megaterium] 
Identities = 62/159 (38%) , Positives = 92/159 (56%) , Gaps = 3/159 (1%) 



Query: 34 ISFDQTIQESTOGQLPNLSTRFFKLITVTGNTVSQIAIAIMSVTFCY--LKKWYPQARFI 91 

+ FD+ + V+G L T K T IG+T S I ++++ + F Y LK F 
Sbjct: 34 LKFDEDVISLVQGWESPLLTDIMKFFTYIGSTASLIILSLVILFFLYRILKHRLELVLFT 93 



Query: 92 AVNAI I SGI C I LSLKLI FQRVRPTLTHLVFAGGYSFPSGHSMGTFMI FGS I I ILLQYYMP 151 
AV + S + L +KL FQR RP L L+ GGYSFPSGH+M F ++G + LL ++ 
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Sbjct: 94 AV-MVGSPLLNLMVKLFFQRARPDLHRLIDIGGYSFPSGHAMNAFSLYGILTFLLWRHIT 152 

Query: 152 KSIWKLLCQGTLGLLIFLIGLSRIYLGVHFPTDVLAGFI 190 

++L L+I IG+SRIYLGVH+P+D++AG++ 

Sbjct: 153 ARWARILLILFSMLMILSIGISRIYLGVHYPSDIIAGYL 191 

A related DNA sequence was identified in S. pyogenes <SEQ ID 185 1> which encodes the amino acid 
sequence <SEQ ID 1852>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>>> Seems to have an uncleavable N-term signal seq 



INTEGRAL 


Likelihood 


=-11. 


.30 


Transmembrane 


154 - 


170 


( 


150 


- 181) 


INTEGRAL 


Likelihood 


=-10. 


,88 


Transmembrane 


65 - 


81 


( 


58 


- 93) 


INTEGRAL 


Likelihood 


= -8. 


,97 


Transmembrane 


10 - 


26 


( 


5 


- 31) 


INTEGRAL 


Likelihood 


= -3. 


.77 


Transmembrane 


86 - 


102 


( 


86 


- 105) 


INTEGRAL 


Likelihood 


= -2. 


.71 


Transmembrane 


185 - 


201 


( 


183 


- 202) 


INTEGRAL 


Likelihood 


= -1. 


.54 


Transmembrane 


130 - 


146 


( 


130 


- 148) 



Final Results 

bacterial membrane Certainty=0 . 5522 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 88/197 (44%) , Positives = 134/197 (67%) , Gaps = 1/197 (0%) 



Query: 


1 


MLSRQNSKLIQAFIAIILFFSLGLVIKYWPDTVISFDQTIQESvRGQLPNLSTRFFKLIT 


60 






M ++Q LI +F A+++F +G +K++P+ + D TIQ +RG LP + T+FF+ +T 




Sb j ct : 


2 


MTNKQTHFLIASF-ALLIFVIIGYTVKFFPERIJU^LDNTIQAEIRGNLPIVLTQFFRGVT 


60 


Query: 


61 


VIGNTVSQIAIAIMSOTFCyLKKWYPQARFIAVNAIISGICILSLKLIFQRVRPTLTHLV 12 0 






V GN ++Q+ + I+SV + KW +A FI N 1+ I +LKL +QR RP + HLV 




Sb j ct : 


61 


VFGNVMTQVLLVIVSVLVLFFMKWKIEALFILSNGAIAAFLITTLKLFYQRPRPAIEHLV 120 


Query: 


121 


FAGGYSFPSGHSMGTFMIFGSIIILLQYYMPKSIWKLLCQGTLGLLIFLIGLSRIYLGVH 


180 






+AGGYSFPSGH+MG+ +IFGS++I+ + + + + +LI LIGLSRIYLGVH 




Sb j ct : 


121 


YAGGYSFPSGHAMGSMLIFGSLLIICYQRLHSKLLQFVTSMIFIILILLIGLSRIYLGVH 


180 


Query: 


181 


FPTDVLAGFILAYGILN 197 








+P+D+LAGF+L +GIL+ 




Sbjct: 


181 


YPSDILAGFVLGFGILH 197 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1198 

A DNA sequence (GBSxl274) was identified in S.agalactiae <SEQ ID 3731> which encodes the amino 
acid sequence <SEQ ID 3732>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

>>> Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood = 


-8, 


,44 


Transmembrane 


35 - 


51 


( 


33 - 


59) 


INTEGRAL 


Likelihood = 


-6. 


53 


Transmembrane 


193 - 


209 


( 


179 - 


211) 


INTEGRAL 


Likelihood = 


-4 . 


.46 


Transmembrane 


64 - 


80 


( 


60 - 


82) 


INTEGRAL 


Likelihood = 


-4. 


.09 


Transmembrane 


108 - 


124 


( 


103 - 


128) 


INTEGRAL 


Likelihood = 


-2. 


.71 


Transmembrane 


150 - 


166 


( 


148 - 


166) 


INTEGRAL 


Likelihood = 


-0. 


.06 


Transmembrane 


174 - 


190 


( 


174 - 


190) 



Final Results 

bacterial membrane -- 
bacterial outside -- 
bacterial cytoplasm -- 



Certainty=0. 4376 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 9977> which encodes amino acid sequence <SEQ ID 9978> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC83944 GB:L47648 putative [Bacillus subtilis] 
Identities = 53/186 (28%) , Positives = 109/186 (58%) 

Query: 33 RKMVTIAILSALSFVLMMVSFPLIPGAEFLKVDFSILPMLVAFILFDLKSSYGVLLLRSL 92 

+K+V +++LS+++FVLM+++FP ++LK+DFS +P ++A +++ + V ++++ 

Sbjct: 4 KKLVWSMLSSIAFVLMLLNFPFPGLPDYLKIDFSDVPAIIAILIYGPIiAGIAVEAIKNV 63 

Query: 93 LKVILANRGPETFIGLPMNMVAIALFIASFAIFWKmESAKDFIKASLFGWSLTVS^^VA 152 

L+ 1+ +G N +A LF+ A +K SAK + L GT ++T+ M 

Sbjct: 64 LQYIIQGSMAGVPVGQVANFIAGTLFILPTAFLFKKLNSAKGLAVSLLLGTAAMTILMSI 123 

Query: 153 LiNYVFAIPLYAIFANFDIRTFIGVGNYLLTMVIPFNIVEGILISIVFYLTYVACLPILER 212 

LNYV +P Y F + + + ++ ++PFN+++GI+I++VF L ++ P +E+ 
Sbjct: 124 LNYVIjILPAYTWFLHSPALSDSALKTAVVAGILPFNMIKGIVITvVFSLIFIKLKPWIEQ 183 

Query: 213 YKKTNV 218 
+ ++ 

Sbjct: 184 QRSAHI 189 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3733> which encodes the amino acid 
sequence <SEQ ID 3734>. Analysis of this protein sequence reveals the following: 

Possible site: 26 ' 
>>> Seems to have a cleavable N-term signal seg. 

INTEGRAL Likelihood = -6.48 Transmembrane 82 - 98 ( 74 - 100) 

INTEGRAL Likelihood = -3.93 Transmembrane 161 - 177 ( 152 - 178) 

INTEGRAL Likelihood = -3.61 Transmembrane 108 - 124 ( 107 - 126) 

INTEGRAL Likelihood = -3.61 Transmembrane 33 - 49 ( 31 - 50) 



Final Results 

bacterial membrane Certainty=0. 3590 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC83944 GB:L47648 putative [Bacillus subtilis] 
Identities = 46/182 (25%) , Positives = 97/182 (53%) 

Query: 3 KTHKMIMIGILSAISFLLMLVSFAIIPGAAFLKIEFSIIPVLFGLMIMDLKSAYLILLLR 62 

K K++++ +LS+I+F+LML++F +LKI+FS +P + ++I + + ++ 

Sbjct: 2 KVKKLVWSMLSS IAFVLMLLNFPFPGLPDYLKI DFSDVPAI IAILI YGPLAGIAVEAIK 61 

Query: 63 SLLKLFliNNRGVNDFIGLPMNIIAIALFVTAFALVWNRQKTLSQYVFASLLGTGLLTFGM 122 

++L+ + +G N IA LF+ A ++ + + + LLGT +T M 

Sbjct: 62 NVLQYIIQGSMAGVPVGQVANFIAGTLFILPTAFLFKKLNSAKGLAVSLLLGTAAMTILM 121 

Query: 123 VVLNYTFAIPLYAIFANIDIRAYIGVTKYMMTMVIPFNLVEGLIFAITFYFVYIASKPIL 182 

+LNY +P Y F + + + ++ ++PFN+++G++ + F ++I KP + 
Sbjct: 122 SILNYVLILPAYTWFLHSPALSDSALKTAVVAGILPFNMIKGIVITVVFSLIFIKLKPWI 181 

Query: 183 ER 184 
E+ 

Sbjct: 182 EQ 183 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 110/185 (59%) , Positives = 144/185 (77%) 



Query: 29 MTNTRKMOTIAILSALSFVLMMVSFPLIPGAEFLKVDFSILPMLVAFILFDLKSSYGVLL 88 
M+ T KM+ I ILSA+SF+LM+VSF +IPGA FLK++FSI+P+L ++ DLKS+Y +LL 
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Sbjct: 1 MSKTHKMIMIGILSAISFLLMLVSFAIIPGAAFLKIEFSIIPVXFGLMIMDLKSAYLILL 60 

Query: 89 LRSLLK^ILAmGPETFIGLPMNWAIALF]^FAIFWKNRESAKDFIKASLFGTOSLTV 148 

LRSLLK+ L NRG FIGLPMN++A+ALF+ +FA+ W +++ ++ ASL GT LT 
Sbjct: 61 LRSLLKLFLNNRGVOT^FIGLPMNIIAIALFVTAFALVWmQKTLSQYVFASLLGTGLLTF 120 

Query: 149 SMVALNYVFAI PLYAI FANFDI RTF I GVGNYLLTMVI PFNI VEGI LI S I VFYLTYVACLP 208 

MV LNY FAI PLYAI FAN DIR +IGV Y++TMVIPFN+VEG++ +1 FY Y+A P 
Sbjct: 121 GMTOLNYTFAIPLYAIFANIDIRAYIGOTKYMMTWIPFNLVEGLIFAITFYFVYIASKP 180 

Query: 209 ILERY 213 

ILERY 
Sbjct: 181 ILERY 185 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1199 

A DNA sequence (GBSxl275) was identified in S.agalactiae <SEQ ID 3735> which encodes the amino 
acid sequence <SEQ ID 3736>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-11.04 Transmembrane 278 - 294 ( 270 - 298) 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 3736 (GBS150) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 23 (lane 7; MW 29.7kDa) and in Figure 175 (lane 4 & 5; MW 30kDa). 

Purified GBS150-His is shown in Figure 1 10A, Figure 199 (lane 5) and Figure 227 (lanes 6-7). 

The purified GBS150-His fusion product was used to immunise mice (lane 1+2 product; 20ug/mouse). The 
resulting antiserum was used for Western blot (Figure HOB), FACS (Figure HOC ), and in the in vivo 
passive protection assay (Table III). These tests confirm that the protein is immunoaccessible on GBS 
bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1200 

A DNA sequence (GBSxl276) was identified in S.agalactiae <SEQ ID 3737> which encodes the amino 
acid sequence <SEQ ID 373 8>. This protein is predicted to be a fimbria-associated protein. Analysis of this 
protein sequence reveals the following: 

Possible site: 40 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-15.34 Transmembrane 264 - 280 ( 257 - 285) 
INTEGRAL Likelihood = -7.64 Transmembrane 23 - 39 ( 12 - 41) 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 5416 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



Final Results 
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bacterial membrane Certainty=0. 7135 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

5 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC13546 GB:AF019629 putative f imbria-associated protein 
[Actinomyces naeslundii] 
Identities = 95/271 (35%) , Positives = 139/271 (51%) , Gaps = 16/271 (5%) 

10 Query: 29 VGLLITSYPFISNWYYNIKANNQVTNFDNQTQKIOTKEINRRFEIAKAYNRTLDPSRLSD 88 

+GLL +YP ++W + ++ Q + + E A AYN L + + 

Sbjct: 1 MGLL- -TYPTAASWSQYNQSKVTADYSAQVDGARP -DAKTQVEQAHAYNDALSAGAVLE 57 

Query: 89 PYTE KEKKGIAEYAHMLEIAE- -MIGYIDIPSIKQKLPIYAGTTSSVLEKGAGH 140 

15 K +YA++L+ ++ + IPSI LP+Y GT L KG GH 

Sbjct: 58 ANNHVPTGAGSSKDSSLQYANILKANNEGLMARLKIPSISLDLPVYHGTADDTLLKGLGH 117 

Query: 141 LEGTSLPIGGKSSHOTITAHRGLPKAKLFTDLDKLKKGKIFYIHNIKEVLAYKVDQISW 200 
LEGTSLP+GG+ + +VIT HRGL +A +FT+LDK+K G + EVL Y+V W 
20 Sbjct: 118 LEGTSLPVGGEGTRSVITGHRGIAEATMFTNLDKVlCrGDSLIvEVFGEvljTYRvTSTKVV 177 

Query: 201 KPDNFSKLLWKGKDYATLLTCTPYSINSHRLLVRGHRIKYVPPVKEKNYLMKELQTHYK 260 

+P+ L V +GKD TL+TCTP IN+HR+L+ G RI Y P K+ K + 

Sbjct: 178 EPEETEALRVEEGKDLLTLVTCTPLGINTHRILLTGERI-YPTPAKDLAAAGKRPDVPHF 236 

25 

Query: 261 LYFLLS I LVIL I LVALLL YLKRKFKER 287 

++ + + LI+V L L Y + KER 
Sbjct: 237 PWWAVGIiAAGLI WGLYLWRSGYAAARAKER 267 

30 A related DNA sequence was identified in S.pyogenes <SEQ ID 3739> which encodes the amino acid 
sequence <SEQ ID 3740>. Analysis of this protein sequence reveals the following: 
Possible site: 49 

>>> Seems to have no N- terminal signal sequence 
35 INTEGRAL Likelihood =-14.01 Transmembrane 225 - 241 ( 220 - 248) 

Final Results 

bacterial membrane Certainty=0 . 6604 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC13546 GB:AF019629 putative f imbria-associated protein 
[Actinomyces naeslundii] 
45 Identities = 94/250 (37%) , Positives = 133/250 (52%) , Gaps = 17/250 (6%) 

Query: 1 VECYRDRQLLSTYHKQVTQKKPSEMEEVWQKAKAYNARLGIQPVPDAF SFRD 52 

V Y ++ + Y QV +P +V ++A AYN L V +A S +D 

Sbjct: 13 VSQYNQSKOTADYSAQVDGARPDAKTQV-EQAHAYNDALSAGAVLEANNHVPTGAGSSKD 71 

50 

Query: 53 GIHDKiraSLLQIFjgNDIMGYVEVPSIKVTLPIYHYTTDEVLTKGAGHLFGSALPVGGDG 112 

Y ++L+ N +M +++PSI + LP+YH T D+ L KG GHL G++LPVGG+G 
Sbjct: 72 S--SLQYANILKANNEGLMARLKIPSISLDLPVYHGTADDTLLKGLGHLEGTSLPVGGEG 129 

55 Query: 113 THTVISAHRGLPSAEMFTOTjNLVKKGDTFYFRVLNKVLAYKVDQILTvEPD 172 

T +VI+ HRGL A MFTNL+ VK GD+ V +VL Y+V VEP++ +L 

Sbjct: 130 TRSVITGHRGIAFATMFTNLDKVKTGDSLIVEWGEVliTYRVTSTKVVEPEETEALRVEE 189 

Query: 173 GKDYATLOTCTPYGVOTKRLLVRGHRIAYHYKKYQQA^ 232 
60 GKD TLVTCTP G+NT R+L+ G RI Y K + K A G+ 

Sbjct: 190 GKDLLTLVTCTPLGINTHRILLTGERI YPTPAKDLAAAGKRPDVPHFPWWAVGL 243 



Query: 233 VIAIILVFMY 242 
+I+V +Y 
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Sbjct: 244 AAGLIWGLY 253 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 93/192 (48%) , Positives = 130/192 (67%) , Gaps = 2/192 (1%) 

5 





Query: 


52 


VTNFDNQTQKLNTKEINRRFEIAKAYNRTLDPSRLSDPYTEKEKKGIAEYAHMLEIA--E 


109 








++ + Q + E+ ++ AKAYN L + D ++ ++ Y +L+I + 






Sbjct: 


10 


LSTYHKQVTQKKPSEMEEVWQKAKAYNARLGIQPVPDAFSFRDGIHDKNYESLLQIENND 


69 


10 


Query: 


110 


MIGYIDIPSIKQKLPIYAGTTSSVLEKGAGHLEGTSLPIGGKSSHTVITAHRGLPKAKLF 


169 








++GY+++PSIK LPIY TT VL KGAGHL G++LP+GG +HTVI +AHRGLP A++F 






Sb j ct : 


70 


IMGYVEVPSIKVTLPIYHYTTDEVLTKGAGHLFGSALPVGGDGTHTVISAHRGLPSAEMF 


129 




Query: 


170 


TDLDKLKKGKI FY1HNI KEVLAYKVDQI SWKPDNFSKLLWKGKDYATIiLTCTPYS INS 


229 


15 






T+L+ +KKG FY + +VLAYKVDQI V+PD + L V GKDYATL+TCTPY +13+ 






Sb j ct : 


130 


TNL^VKKGDTFYFRVIJtfKVIAYKVD^^ 


189 




Query: 


230 


HRLLVRGHRIKY 241 










RLLVRGHRI Y 




20 


Sb j ct : 


190 


KRLLVRGHRIAY 201 





SEQ ID 3738 (GBS210) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 50 (lane 3; MW 61kDa). 

GBS210d was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
25 shown in Figure 152 (lane 2-4; MW 54kDa) and in Figure 187 (lane 9; MW 54kDa). It was also expressed 
in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 154 (lane 2-4; 
MW 28.7kDa) and in Figure 182 (lane 13; MW 29kDa). Purified GBS210d-GST is shown in lane 4 of 
Figure 237. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
30 vaccines or diagnostics. 

Example 1201 

A DNA sequence (GBSxl277) was identified in S.agalactiae <SEQ ID 3741> which encodes the amino 
acid sequence <SEQ ID 3742>. This protein is predicted to be a fimbria-associated protein. Analysis of this 
protein sequence reveals the following: 

35 Possible site: 42 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-10.61 Transmembrane 20 - 36 ( 15 - 40) 
INTEGRAL Likelihood = -7.27 Transmembrane 259 - 275 ( 258 - 277) 

40 Final Results 

bacterial membrane — Certainty=0 . 5246 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC13546 GB:AF019629 putative fimbria-associated protein 
[Actinomyces naeslundii] 
Identities = 76/219 (34%) , Positives = 120/219 (54%) , Gaps = 12/219 (5%) 

50 Query: 28 LSILLYPWSRFYYTIESNNQTQDFERAAKKLSQKEINRRMALAQAYNDSLN N 80 

+ +L YP + + + T D+ A ++ + ++ A AYND+L+ N 

Sbjct: 1 MGLLTYPTAASWSQYNQSKVTADYS-AQVDGARPDAKTQVEQAHAYNDALSAGAVLEAN 59 

Query: 81 vHLEDPYEKKRIQKGVAEYARMLEVSEK- - IGTISVPKIGQKLPIFAGSSQEVLSKGAGH 138 
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H+ P + +YA +L+ + + + + +P I LP++ G++ + L KG GH 

Sbjct: 60 NHV--PTGAGSSKDSSLQYANILKAKNEGLMARLKIPSISLDLPVYHGTADDTLLKGLGH 117 

Query: 139 LEGTSLPIGGNSTHTVITAHSGIPDKELFSNLKKLKKGDKFYIQNIKETIAYQVDQIKVV 198 

LEGTSLP+GG T +VIT H G+ + +F+NL K+K GD ++ E + Y+V KW 
Sbjct: 118 LEGTSLPVGGEGTRSVITGHRGIAFATMFTNLDKVKTGDSLIVEVFGEVLTYRVTSTKVV 177 

Query: 199 TPDNFSDLLWPGHDYATLLTCTPIMINTHRLLVRGHRI 237 

P+ L V G D TL+TCTP+ INTHR+L+ G RI 
Sbjct: 178 EPEETFALRVEEGKDLLTLVTCTPLGINTHRILLTGERI 216 

There is also homology to SEQ ID 3740. 

A related GBS gene <SEQ ID 8749> and protein <SEQ ID 8750> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 10 

McG: Discrim Score: 9.66 

GvH: Signal Score (-7.5): -6.53 
Possible site: 42 

»> Seems to have an uncleavable N-term signal seg 

ALOM program count: 2 value: -10.61 threshold: 0.0 

INTEGRAL Likelihood =-10.61 Transmembrane 20 - 36 ( 15 - 40) 
INTEGRAL Likelihood = -7.27 Transmembrane 259 - 275 ( 258 - 277) 
PERIPHERAL Likelihood = 5.14 216 
modified ALOM score : 2.62 

*** Reasoning Step: 3 



Final Results 

bacterial membrane — Certainty=0. 5246 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

33.4/53.0% over 277aa 

Actinomyces 

naeslundii 

GP| 3036999 | putative f imbria-associated protein Insert characterised 
ORF00563(382 - 1179 of 1479) 

GP|3036999|gb|AAC13546.l| |AF019629(1 - 278 of 365) putative f imbria-associated protein 
{Actinomyces naeslundii} 
%Match =13.4 

%Identlty =33.3 %Similarity =53.0 

Matches = 90 Mismatches = 118 Conservative Sub.s = 53 

180 210 240 270 300 330 360 390 

WIMKRRQSKEA*G*SLMMYKRS*SCAYDLRVFQ*KYS*IISKSHYLGDDVKTKKIIKKTKKKKKSNLPFIILFLIGLSI 

MGL 

420 450 480 510 549 579 609 

LLYPWSRFYYTIESNNQTQDFERAAKKLSQKEINRRMALAQAYNDSLN NVHLEDPYEKKRIQKGVAEYARML 

lll=: = I 1= I 1 = 1111 = 1= 11=1 = =11 =| 

LTYPTAASWVSQYNQSKVTADYS -AQ vDGARPDAKTQvECAHAYNDALSAGAVLEANNHV- - PTGAGSSKDSSLQYANIL 
20 30 40 50 60 70 80 

633 663 693 723 753 783 813 843 

EVS--EKIGTISVPKIGQKLPIFAGSSQEVLSKGAGHLEGTSLPIGGNSTHTVITAHSGIPDKELFSNLKKLKKGDKFYI 
= = = = =1 I lh= h = = I II llllllllhll I =111 I h : =1 = 11 hi || : : 
KANNEGLMARLKIPSISLDLPVYHGTADDTLLKGLGHLEGTSLPVGGEGTRSVITGHRGLAEATMFTNLDKVKTGDSLIV 
90 100 110 120 130 140 150 160 



873 903 933 963 993 1023 1053 1083 

QNIKETIAYQVDQIKVVTPDNFSDLLWPGHDYATLLTCTPIMINTHRLLWGHRIPYKGPIDEICLIKDGHLNTIYRYLF 
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= I = 1 = 1 1111= I I I I Ihlllh 11111 = 1= 11111=1 I = = = 

EVFGEVLTYRVTSTKVVEPEETEALRVEEGKDLLTLVTCTPLGINTHRILLTGERI-YPTPAKD-IJWiGKRPDVPHFPW 
170 180 190 200 210 220 230 

1098 1179 1209 1239 1269 1299 

Y ISLVIIAWLLWL--IKRQRQKm-LASWKGIES*VffiENFRKTLRmSF*IDG*M*A*YYCS*LVF**PHILLF 

l=== II I I I II I I = = =1 I == 11= 

WAVGLAAGLIWGLYLWRSGYAAARRKERALARARAA.QEEPQPQTWAEQMRIWMDDDAGVEPQRWFTDLPVPPQPSEMEN 
250 260 270 280 290 300 310 

SEQ ID 8750 (GBS212) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 44 (lane 4; MW 36kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 50 (lane 2; MW 61kDa). 

Purified Thio-GBS212-His is shown in Figure 244, lane 5. 

15 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1202 

A DNA sequence (GBSxl278) was identified in S.agalactiae <SEQ ID 3743> which encodes the amino 

acid sequence <SEQ ID 3744>. Analysis of this protein sequence reveals the following: 

20 Possible site: 29 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-10.40 Transmembrane 680 - 696 ( 674 - 699) 

Final Results 

25 bacterial membrane — Certainty=0. 5161 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

30 >GP:CAA57459 GB:X81869 orf2 [Lactobacillus leichmannii] 

Identities = 84/325 (25%) , Positives = 122/325 (36%) , Gaps = 94/325 (28%) 

Query: 397 VNWYTLKDKD KTVASVSLTKTSKGTI DLGNGIKFEVSGNF 437 

VNV + +KDKD TV+ LTK++ T+ D G + F+ + 
35 Sbjct: 236 VNVPWNIKDKDTFNVVDKPDTGIDIDASTVSirX3LTKSTDYTVNKKDNGYQVVFKTT 292 

Query: 438 SGKFTGLENKSYMISERVSGYGSAINLENGKVTITNTKDSDNPTPLNPTEPKVETHGKKF 497 

S L KS 1+ K T+TN D + T +G 

Sbjct: 293 SAAVQALAGKSLTITY KATLTNNATPDKA- - IGNTATLSIGNGTNI 336 

40 

Query: 498 VKTNEQGDRL- -AGAQFWKNSAGKfLALKADQSEGQKTLAAKKIALDEAIAAYNKLSAT 555 

T G R+ GAQFV K+S + KTLA + L + + N +S 

Sbjct: 337 TSTPANGPRIYTGGAQFVKKDS QSNKTLAGAEFQLVKVDSNGNIVSYA 384 

45 Query: 556 DQKGEKGITAKELIKTKQADYDAAFIEARTAYEWITDKARAITYTSNDQGQFEVTGLADG 615 
Q + +Y W A TYTS+ G + GL+ 

Sbjct: 385 TQASDG SYTWNDSATEATTYTSDANGLVALKGLSYS 420 

Query: 616 TYNLEETIAPAGFAKLAGNIKFVWQGSYITGGNIDYVANSNQKDATRVENKK 668 

50 +Y L E AP G+AKL +KF + QGS+ G+ + + N K+ 

Sbjct: 421 DKLDSGESYALLEIQAPDGYAKLDSPVKFSITQGSF GDSNKITIDNTKEG 470 

Query: 669 VTIPQTGGIGTILFTIIGLSIMLGA 693 
+P TGG G +F IG+ IM+ A 
55 Sbjct: 471 -LLPSTGGKGIYIFLAIGIVIMIVA 494 



No corresponding DNA sequence was identified in S.pyogenes. 



WO 02/34771 PCT/GB01/04789 

-1353- 

SEQ ID 3744 (GBS59) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 7 (lane 8; MW 120kDa), in Figure 11 (lane 9; MW lOOkDa) and in Figure 13 
(lane 6; MW 74kDa). 

GBS59-His was purified as shown in Figure 193, lane 2. 

5 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1203 

A DNA sequence (GBSxl279) was identified in S.agalactiae <SEQ ID 3745> which encodes the amino 
acid sequence <SEQ ID 3746>. Analysis of this protein sequence reveals the following: 

10 Possible site: 25 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -3.13 Transmembrane 870 - 886 ( 864 - 887) 

Final Results 

15 bacterial membrane Certainty=0 . 2253 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

20 >GP:AAD33086 GB:AF071083 f ibronectin-binding protein I [Streptococcus pyogenes] 

Identities = 58/176 (32%) , Positives = 83/176 (46%) , Gaps = 19/176 (10%) 

Query: 6 KFSKILTLSLFCLSQIPLNTNVLGEST; — VPENGA--KGKLWKKTDDQNKPLSKATFV 60 
K S +L+L+ F L + + + G S NGA +G +KK D NKPL AT 

25 Sbjct: 8 KLSFLLSLTGFILGLLLVFIGLSGVSVGHAETRNGANKQGSFEIKKVDQNNKPLPGATSS 67 

Query: 61 LKTTAHPESKIEKVTAELTGEATFDNLIPGDYTLSEETAPEGYKKTNQTWQVKVESNGKT 120 

L + + ++ T+ G NL PG YTL EETAP+GY KT++TW V V NG T 

Sbjct: 68 LTSKDGKGTSVQTFTSNDKGITOAQNLQPGTYTLKEETAPDGYDKTSRTWTVTVYENGYT 127 

30 

Query: 121 TIQNSGDKNSTIGQNQEELDKQYPPTGIYEDTKESYKLEHVKGSVPN--GKSEAKA 174 
+ + I + +D S +LE+ K SV + GK+E + 

Sbjct: 128 KLVENPYNGEI I SKAGS KDVSSSLQLENPKMSWSKYGKTEVSS 171 

Identities = 31/92 (33%) , Positives = 49/92 (52%) , Gaps = 14/92 (15%) 

35 

Query: 725 PTITIKNEKKLGEIEFIKVDKDI^KLLLKGATFELQEFNEDYKLYLPIKNNNSKvVTGEN 784 

P+IT+ N K++ ++ F K+ DN + L A FEL+ N N+ K+ N 

Sbjct: 501 PSITVANLKRVAQLRFKKMSTDN- -VPLPEAAFELRSSN GNSQKLEASSN 548 

40 Query: 785 --GKISYKDLKDGKYQLIEAVSPEDYQKITNK 814 

G++ +KDL GYLE +P+ YQ++T K 
Sbjct: 549 TQGEVHFKDLTSGTYDLYETKAPKGYQQVTEK 580 

No corresponding DNA sequence was identified in S.pyogenes. 

45 SEQ ID 3746 (GBS67) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 7 (lane 10; MW 140kDa), in Figure 11 (lane 10; MW 150kDa) and in Figure 12 
(lane 6; MW 95.3kDa). 

GBS67-His was purified as shown in Figure 192, lane 10. 



50 



Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1204 

A DNA sequence (GBSxl280) was identified in S.agalactiae <SEQ ID 3747> which encodes the amino 
acid sequence <SEQ ID 3748>. This protein is predicted to be Nra. Analysis of this protein sequence 
reveals the following: 

Possible site: 34 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2020 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9979> which encodes amino acid sequence <SEQ ID 9980> 
was also identified. 

A related DNA sequence was identified in S. pyogenes <SEQ ID 3749> which encodes the amino acid 
sequence <SEQ ID 3750>. Analysis of this protein sequence reveals the following: 

Possible site: 58 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.75 Transmembrane 393 - 409 ( 392 - 409) 

Final Results 

bacterial membrane Certainty=0 . 1702 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 122/325 (37%), Positives = 186/325 (56%), Gaps = 5/325 (1%) 



Query: 


7 


LIENYLEKDILNQIKLLTLCY- -DYYPSITLDKSCHQLGLSELLIRKYCHDLTTLFNSQL 


64 






LIE YLE I ++ +L+ L + Y P + + + GL+ L + YC +L F L 




Sbjct: 


1 


LIEKYLESSIESKCQLI VLFFKTSYLP- - - ITEVAEKTGLTFLQLNHYCEELNAFFPGSL 


57 


Query: 


65 


SLNIEKSTI VYQSNGVTREQAFKYIYHQSHVLQLLKFLITNDSGRLPLTYFSEKFGLSCA 


124 






S+ I+K I Q +E +Y S+VLQLL FLI N S PLT F+ LS + 




Sbjct: 


58 


SMTIQKRMISCQFTHPFKETYLYQLYASSNVLQLLAFLIKNGSHSRPLTDFARSHFLSNS 


117 


Query: 


125 


TAYRIRKHISPLLEKLGFQIVTQJTITGDEYRIRYLIAFLNAQFGIEVYPMSKMDKLLIKR 


184 






+AYR+R+ + PLL ++ KN I G+EYRIRYLIA L ++FGI+VY +++ DK I 




Sbjct: 


118 


SAYRMREALIPLLRNFELKLSKNKIVGEEYRIRYLIALLYSKFGIKVYDLTQQDKNTIHS 


177 


Query: 


185 


LLLEHSTTFTASHYFPNTFIFFDTLLSLSWKRINYNVWPYSSLFTELQNIFIYDTLQYC 


244 






L ST S + +F F+D LL+LSWKR ++V +P + +F +L+ +F+YD+L+ 




Sbjct: 


178 


FLSHSSTHLKTS PWLSES FS FYD I LLALSWKRHQFS VTI PQTRI FQQLKKLFVYDSLKKS 


237 


Query: 


245 


VKNVIIDSFKINLKKDDIDYIFLAYLTSHNSFSNPNWTEJCRIDNVIAIFENYPKFQKLLQ 


304 






++I ++N D+DY++L Y+T++NSF++ WT + I +FE F+ LL 




Sbjct: 


238 


SHDIIETYCQLNFSAGDLDYLYLIYITANNSFASLQWTPEHIRQYCQLFEENDTFRLLLN 


297 


Query: 


305 


PLKDALPLSGSYHDELVKVAI FFSE 329 








P+ LP LVK +FFS+ 




Sbj ct : 


298 


PIITLLPNLKEQKASLVKALMFFSK 322 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1205 

A DNA sequence (GBSxl281) was identified in S.agalactiae <SEQ ID 3751> which encodes the amino 
acid sequence <SEQ ID 3752>. This protein is predicted to be galactosyltransferase. Analysis of this protein 
sequence reveals the following: 

5 Possible site: 21 

=>>> Seems to have no N-terminal signal sequence (or aa 1-22) 

Final Results 

bacterial cytoplasm Certainty=0 . 1168 (Affirmative) < suco 

10 bacterial membrane Certainty=0. 0000 (Mot Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB99071 GB:U67549 galactosyltransferase isolog [Methanococcus 
15 jannaschii] 

Identities = 108/395 (27%) , Positives = 196/395 (49%) , Gaps = 28/395 (7%) 

Query: 4 KAnCTVAVFSGyYLPFLGGIERYTDKMTADLVK-RGYRWIVTTNHGDLPIIDEDKGR 59 

K+K + +F GYY+P +GG+E +D+TL+ Y + I N +P E+R 
20 Sbjct: 3 KIKLI-IFPGYYIPHIGGLETHVDEFTKHLSEDENYDIYIFAPN IPKYKEFEIRHNN 58 

Query: 60 -KIYRLPTKNIVKQRYPIINK-NREYNTLMKYVSDENIDFVICNTRFQLTTLEGLSFAKN 117 

K+YR P 1+ YP+ N N ++ + + + D V+ TRF TL G FAK 
Sbjct: 59 VKVYRYPAFEIIPN-YPVPNIFNIKFWRMFFNLYKIDFDIVMTRTRFFSNTLLGFIFAKL 117 

25 

Query: 118 HHLPS- - IVLDHGSSHFSVNNRFLDFFGftlYEHLLTARVKHYRPDFYAVSKRSVEWIiKHF 175 

I ++HGS+ + + F + Y+ + + A+SK ++ 

Sbjct : 118 RFKKKKLIHVEHGSAFVTCLESEFKNKLSYFYDKTIGKLIFKKADYVVAISKAVKNFILEN 177 

30 Query: 176 NIEAKGV- - IYNSVS ESLGSDFAGTAYLEKSADDIFITYAGRIIKEKGIELLLEAF 229 

+ K + IY + ES+G D EK + I + + GR+ K KG+E +++A+ 

Sbjct: 178 FVNDKDIPIIYRGLEIEKIESIGED KKIKEKFKNKIKLCFVGRLYKWKGVENIIKAY 234 

Query: 230 S--MSQYSENVYLQIAGDGPELAHLKE KYQSKQINFLGKLNFEQTMSLMAQTDIFVY 284 

35 E + L + G G +L LK+ Y + IF GK++FE+ ++++ +DI+++ 

Sbjct: 235 TOLPKDLKEKIILIWGYGEDLERLKKLAGNYLNNGIYFTGKVDFEKAIAIVKASDIYIH 294 

Query: 285 PSMYPEGLPTS ILEAGLLSSAI IATDRGGTVE VTDSPELGI IMEENT - QSLHESLDLLVK 343 
S GL +S+L+A AI+A+ G EV+ GI++++N+ + + + L++ 

40 Sbjct: 295 SSYKGGGLSSSLLQAMCCGKAIVASPYEGADEWIDGYNGILLKDNSPEEIKRGIIKLIE 354 

Query: 344 DKALREKLQQNIAKRIKEHFTWEKTVEKLDYIIQK 378 

+ LR+ +N IKE+F W+K+V++ I ++ 
Sbjct: 355 NNNLRKIYGENAKNFIKENFNWKKSVKEYKKIFER 389 

45 

i 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 3752 (GBS258) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 45 (lane 2; MW 43kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 48 (lane 7; MW 67.9kDa). 

50 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1206 

A DNA sequence (GBSxl282) was identified in S.agalactiae <SEQ ID 3753> which encodes the amino 
acid sequence <SEQ ID 3754>. Analysis of this protein sequence reveals the following: 

55 Possible site: 31 
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»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1182 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB52237 GB:Z98171 EpsQ protein [Streptococcus thermophilus] 
Identities = 112/278 (40%) , Positives = 163/278 (58%) , Gaps = 2/278 (0%) 



Query: 


1 


MKYIAGIVTFNPNIERLDQNIRAIYPQVSHIYIVDNGSKNKEEISQLVADYNEEGHLTVD 


60 






M AGIV FNP+I+RL +NI A+ Q +H+Y+VDNGS N +E+ L+ YN+ +++ 




Sbjct: 


1 


MDISAGIVLFNPDIKRLKENIDAVIIQCTHLYLVDNGSGNVDEVKGLRJQYNQS-KISIL 


59 


Query: 


61 


YLTENKGIAYAIiNCIGQFAVAQEFDWFLTLDQDSWLGDLIDNYENYLHLPKVGMLSCLY 


120 






+ EN+GIA ALN + A + FDW LTLDQDSW +++ +E Y++" VG+L + 




Sb j ct : 


60 


WNRENQGIAKALNQLTSAAQKEGFDWILTLDQDSWPSNIVGEFEKYINNSSVGILCPII 


119 


Query: 


121 


QDMNRENLVMQEFDYKEIEECITSAALMKTSVFEETSGFAEEMFIDFVDSEMNYRLSEMG 


180 






D N++ + D EI+ECITS +L+ + E GF E MFID VD ++ YRL + G 




Sbjct: 


120 


CDRNKDEEIKINEDCTEIDECITSGSLLNIKAWSEIGGFDERMFIDGVDFDICYRLRQRG 


179 


Query: 


181 


YKTYQVNFIGLLHEIGHSSRVKKFGHVFHVLNHSPFRKYYMIRNAIYIIKKYGKKKRYKY 


240 






YK Y ++ + LLHE+GH + V NHS FRKYY+ RN IY KK 




Sbjct: 


180 


YKIYCIHSVVLLHELGHIEYHRFLFWKVLVKNHSAFRKYYIARNIIYTAKKRRSTLLWK 


239 


Query: 


241 


LVFMRNEFVRVLV-AEEQKSKKIVAMIKGLKDGIiLMKV 277 








+ + + +++ EE K KI + +G+ DG KV 




Sbjct: 


240 


GLLQE1KLIGIVIFYEEDKLNKIRCICRGIYDGFKGKV 277 





No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1207 

A DNA sequence (GBSxl283) was identified in S.agalactiae <SEQ ID 3755> which encodes the amino 
acid sequence <SEQ ID 3756>. This protein is predicted to be EpsU protein (rfbX). Analysis of this protein 
sequence reveals the following: 

Possible site: 54 

»> Seems to have an uncleavable N-term signal seq 



INTEGRAL 


Likelihood 




-8 


44 


Transmembrane 


357 


- 373 


( 352 


- 387) 


INTEGRAL 


Likelihood 




-7 


59 


Transmembrane 


88 


- 104 


( 79 


- 107) 


INTEGRAL 


Likelihood 




-7 


32 


Transmembrane 


440 


- 456 


( 433 


- 465) 


INTEGRAL 


Likelihood 




-6 


48 


Transmembrane 


246 


- 262 


( 245 


- 263) 


INTEGRAL 


Likelihood 




-4 


78 


Transmembrane 


294 


- 310 


( 290 


- 312) 


INTEGRAL 


Likelihood 




-3 


88 


Transmembrane 


164 


- 180 


( 162 


- 183) 


INTEGRAL 


Likelihood 




-3 


56 


Transmembrane 


144 


- 160 


( 136 


- 161) 


INTEGRAL 


Likelihood 




-2 


87 


Transmembrane 


317 


- 333 


( 316 


- 334) 


INTEGRAL 


Likelihood 




-2 


71 


Transmembrane 


374 


- 390 


( 374 


- 393) 


INTEGRAL 


Likelihood 




-0 


96 


Transmembrane 


44 


- 60 


( 44 


- 62) 


INTEGRAL 


Likelihood 




-0 


80 


Transmembrane 


15 


- 31 


( 15 


- 32) 



Final Results 

bacterial membrane Certainty=0 .4376 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB52225 GB:Z98171 EpsU protein [Streptococcus thermophilus] 
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Identities = 189/462 (40%) , Positives = 313/462 (66%) 



Query: 


1 


MKLLK^FYNTSYQLLTLLLPLVTVPYVSRVLSPQGIGINAYTSSIvMYFTLFGALGISL 


60 






M+++KN YN YQ+ +++PL+T+PY+SR+L P GIGIN+YT+SIV YF LFG++G+ L 




Sbjct: 


1 


MQIVKNYLYNAIYQVFIIIVPLLT1PYLSRILGPSGIGINSYTNSIVQYFVLFGSIGLGL 


60 


Query: 


61 


YGmEIAFVQSNiaKRSKIFWELVVLKLASVSIATLLFFGFVLLTOEWQLFYLIQGINLL, 


120 






YGNR+IAFV+ N+ K SK+F+E+ +L+L ++ +A LF F+++ ++ +YL Q 1 ++ 




Sbjct: 


61 


YGNRQIAFVRDNQVKMSKVFY^IFILRLFTICLAYFLFVAFLIINGQYYAYYLSQSIAIV 


120 


Query: 


121 


ATATDISWYFIGVEDFKIIVIRNTIVKLITVVLTFLVVKTPDDLALYMFIjIAFASLLGNL 


180 






A A DISW F+G+E+FK+IV+RN IVKL+ + FL VK+ +DL +Y+ + ++L+GNL 




Sb j ct : 


121 


AAAFDISWAFMGIENFKyiVLRNFIVKLIALFSIFLFWSYJ^MIYILITVLSTLIGNL 


180 


Query: 


181 


TVWHHLKHE 1 1 KI PFSRLD I LIHLRPTLMLFLPQITMQIYLSLNKSMLGAMDS WSAGYF 


240 






T + L ++K+ + L + HL+ +L++F+PQI +QIY LNK+MLG+ +DS V S+G+F 




Sb j ct : 


181 


TFFPSLHRYLVKVNYRELRPIKHLKQSLVMFIPQIALQIYWVLNKTMLGSLDSVTSSGFF 


240 


Query : 


241 


DQSDKIIRILFTIVSAIGGVFLPRLSSLFSSGKEKQAKALLLKLVDLSNAISMLMIAGW 


300 






DQSDKI++++ IV+A G V LPR+++ F+ + + K + +AIS+ M+ G++ 




Sb j ct : 


241 


DQSDKIVKL VIAIVTATGTVMLPRVANAFAHREYSKI KEYMYAGFSFVSAI SI PMMFGLI 


300 


Query: 


301 


GVSSTFAVFFFGKGYEAVGPLMAVESLMIICISYGNALGTQYLIASRRTKAYTMSAVIGIi 


360 






++ F FF + V P++ +ES+ II I++ NA+G OYLL + + K+YT+S +IG 




Sb j ct : 


301 


AITPKFVPLFFTSQFSDVIPVLMIESIAIIFIAWSNAIGNQYLLPTNQNKSYTVSVIIGA 


360 


Query: 


361 


VANWLNILLIPILGAMGAIISTVITEFIVSLYQAISLRDVFTFKELTRGMLRYLIAATL 


420 






+ N++LNI LI LGA+GA I+TVI+E V++YQ + L + +YLIA + 




Sb j ct : 


361 


IVNLMLNIPLIIYLGAVGASIATVISEMSVTVYQLFIIHKQLNLHTLFSDLSKYLIAGLV 


420 


Query: 


421 


SGAVLYYINTQMSVSLVNYVIQSLVAVTIYVGIVFITKM»VI 462 








+++ 1+ S + +++ V + IY+ ++ KA +1 




Sb j ct : 


421 


MFLIVFKISLLTPTSWIFILLEITVGIIIYIVLLIFLKAEII 462 





No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1208 

A DNA sequence (GBSxl284) was identified in S.agalactiae <SEQ ID 3757> which encodes the amino 
acid sequence <SEQ ID 375 8>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1742 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1209 

A DNA sequence (GBSxl285) was identified in S.agalactiae <SEQ ID 3759> which encodes the amino 
acid sequence <SEQ ID 3760>. Analysis of this protein sequence reveals the following: 
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Possible site: 25 

»> Seems to have an uncleavable N-term signal seq 

Final Results 

5 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

10 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1210 

A DNA sequence (GBSxl286) was identified in S.agalactiae <SEQ ID 3761> which encodes the amino 
15 acid sequence <SEQ ID 3762>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

>» Seems to have no N-terminal signal sequence 



25 



INTEGRAL 


Likelihood 




•10. 


.56 


Transmembrane 


214 


- 230 


( 


210 


- 236) 


INTEGRAL 


Likelihood 




■10. 


.03 


Transmembrane 


364 


- 380 


( 


361 


- 386) 


INTEGRAL 


Likelihood 




-7, 


,96 


Transmembrane 


272 


- 288 


( 


271 


- 291) 


INTEGRAL 


Likelihood 




-6. 


.95 


Transmembrane 


23 


- 39 


( 


20 


- 41) 


INTEGRAL 


Likelihood 




-5. 


,57 


Transmembrane 


191 


- 207 


( 


189 


- 209) 


INTEGRAL 


Likelihood 




-5. 


,15 


Transmembrane 


434 


- 450 


( 


425 


- 451) 


INTEGRAL 


Likelihood 




-4. 


,25 


Transmembrane 


143 


- 159 


( 


138 


- 162) 


INTEGRAL 


Likelihood 




-3. 


,13 


Transmembrane 


167 


- 183 


( 


166 


- 186) 


INTEGRAL 


Likelihood 




-1. 


,44 


Transmembrane 


400 


- 416 


( 


400 


- 416) 


INTEGRAL 


Likelihood 




-1. 


,33 


Transmembrane 


333 


- 349 


( 


333 


- 349) 


INTEGRAL 


Likelihood 




-0 


.80 


Transmembrane 


232 


- 248 


( 


232 


- 251) 



30 Final Results 

bacterial membrane Certainty=0 . 5225 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1211 

40 A DNA sequence (GBSxl287) was identified in S.agalactiae <SEQ ID 3763> which encodes the amino 
acid sequence <SEQ ID 3764>. This protein is predicted to be rhamnosyltransferase. Analysis of this 
protein sequence reveals the following: 

Possible site: 17 

»> Seems to have no N-terminal signal sequence 

45 

Final Results 

bacterial cytoplasm Certainty=0 . 1792 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 
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A related GBS nucleic acid sequence <SEQ ID 9981> which encodes amino acid sequence <SEQ ID 9982> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF18951 GB:AF155805 Cps9H [Streptococcus suis] 
5 Identities = 53/116 (45%) , Positives = 75/116 (63%) , Gaps = 4/116 (3%) 

Query: 6 VL^TYNGQGFIHDQLDSIOTQTLRPDYVLMRDDGSTDDWKVVEDYIKEHRLDGWSITS 65 

VLMATYNG FI QLDSIRNQ++ D V++ DD STDDT+K+++DYIK++ LD W ++ 
Sbjct: 4 VLMATYNGSPFIIKQLDSIRNQSVSADKVIIWDDCSTDDTIKIIKDYIKKYSLDSWWSQ 63 

10 

Query: 66 MDKNLGWRLNFRQLL IDVLAYEVD WFFSDQDDTWYHHKNKMQVD IMEERQD INLL 121 

N N G F L + VFFSDQDD W HK + +1 +R++++++ 

Sbjct: 64 NKSNQGHYQTFINL TKLVQEGIVFFSDQDDIWDCHKIETMLPIF-DRENVSMV 115 

15 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1212 

A DNA sequence (GBSxl288) was identified in S.agalactiae <SEQ ID 3765> which encodes the amino 
acid sequence <SEQ ID 3766>. This protein is predicted to be rhamnosyltransferase. Analysis of this 
20 protein sequence reveals the following: 

Possible site: 13 

>>> Seems to have no N-terrainal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0. 1278 (Affirmative) < suoo 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suoo 

A related GBS nucleic acid sequence <SEQ ID 9983> which encodes amino acid sequence <SEQ ID 9984> 
30 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF18951 GB:AF155805 Cps9H [Streptococcus suis] 
Identities = 57/146 (39%) , Positives = 81/146 (55%) , Gaps = 8/146 (5%) 

35 Query: 10 VL^TYNGEIFISEQLDSIRQQTLKPDYVIjLRDDCSTDETTOvVNNYIAKHELEGWKIVK 69 

VLMATYNG FI +QLDSIR Q++ D V++ DDCSTD+T+ ++ +YI K+ L+ W + + 
Sbjct: 4 VLMATYNGSPFIIKQLDSIRNQSVSADKVIIWDDCSTDDTIKIIKDYIKKYSLDSWWSQ 63 

Query: 70 NDKNLGWRLNFRQLLIDVLAYEVDYVFFSDQDDIWYLDKNERQFAIMSDKPQIEVLSADV 129 
40 N N G F L + VFFSDQDDIW K E I D+ + + V 

Sbjct: 64 NKSNQGHYQTFINL TKLVQEGI VFFSDQDDIWDCHKI ETMLPI F - DRENVSM V 115 

Query: 130 DIKTMSTEASVPHFLTFSSSDRISQY 155 
K+ + + + +SDRI+ Y 

45 Sbjct: 116 FCKSRLIDENGNIISSPDTSDRINTY 141 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1213 

50 A DNA sequence (GBSxl289) was identified in S.agalactiae <SEQ ID 3767> which encodes the amino 
acid sequence <SEQ ID 3768>. This protein is predicted to be dTDP-glucose 4-6-dehydratase (galE). 
Analysis of this protein sequence reveals the following: 
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Possible site: 44 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.02 Transmembrane 250 - 266 ( 250 - 266) 



5 Final Results 

bacterial membrane Certainty=0 . 1808 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

10 A related GBS nucleic acid sequence <SEQ ID 9985> which encodes amino acid sequence <SEQ ID 9986> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAC14890 GB:AJ295156 d-TDP-glucose dehydratase [Phragmites 
australis] 

15 Identities = 108/327 (33%) , Positives = 170/327 (51%) , Gaps = 22/327 (6%) 

Query: 29 ANKGVLI SGSNSMIiASYMVFLLAYLNETRNYQTQI IATARNIEKARDKFSDL VGKDYFTL 88 

AN +L++G + S++V L N + ++I ++D +G F L 

Sbjct: 33 ANLRILVTGGAGFIGSHLVDKLM ENEKHEVIVADNFFTGSKDNLKKWIGHPRFEL 87 

20 

Query: 89 IPYDVEERLEYDGKVDYIIHAASNASPTAILSNPVSIIKANTIGTLNLLDFAKEKTIENF 148 

I +DV + L + VD I H A ASP NPV IK N IGTLN+L AK + 

Sbjct: 88 IRHDVTQPLLVE- - VDQI YHLACPASPI FYKHNPVKTIKTNVIGTLNMLGLAK- RVGARI 144 

25 Query: 149 LFLSTREVYGTSIKEVIDEEAYGGFDILATRACYPESKRMAETLLQSYYDQYKVPFTIAR 208 

L ST EVYG ++ E +G + + R+CY E KR+AETL+ Y+ Q+ + IAR 
Sbjct: 145 LLTSTSEVYGDPLEHPQTEAYWGNVNPIGVRSCYDEGKRVAETLMFDYHRQHGIEIRIAR 204 

Query: 209 IAHSFGPGMELGNDGRIMISroLLSOTIIXSKDIvLKSSGTAERAFCYLADAVSGLFTILLNG 268 
30 I +++GP M + +DGR++++ ++ + G + ++ GT R+FCY+AD V GL L+NG 

Sbjct: 205 IFNTYGPRMNI-DDGRWSNFIAQAvRGDPLTVQKPGTQTRSFCYVADMVDGLIK-LMNG 262 

Query: 269 EVGQAYNVANEDQPIMIKDLAQKLVDLFSDKNISWFDIPKTMSAGYSKMGRTR---LTM 325 
N+ N + M+ +LA+K+ +L + ++ TM+ R R +T 

35 Sbjct: 263 NNTGPINLGNPGEFTML-ELAEKVKELINP EVTVTMTENTPDDPRQRKPDITK 314 

Query: 326 AKLEALGWKREVSLESGILKTVQAFEE 352 

AK E LGW+ +V L G++ F E 

Sbjct: 315 AK-EVLGWEPKWLRDGLVLMEDDFRE 340 

40 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1214 

A DNA sequence (GBSxl290) was identified in S.agalactiae <SEQ ID 3769> which encodes the amino 
45 acid sequence <SEQ ID 3770>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

»> Seems to have an uncleavable N-term signal seq 

Final Results 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9987> which encodes amino acid sequence <SEQ ID 9988> 
55 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11866 GB:Z99104 similar to hypothetical proteins [Bacillus subtilis] 
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10 



Identities = 77/231 (33%) , Positives = 131/231 (56%) , Gaps = 6/231 (2%) 

Query: 13 VI FAGGVGRRMNTKGKPKQFLEVHGKPI IVHTIDIFQNTEA.IDAWWCVSDWLDYMNNL 72 

VI A G G+RM G+ K F+E+ G P+I+HT+ +F + D +++V ++ L 

Sbjct: 6 VIPAAGQGKRMKa-GRNKLFIELKGDPVIIHTLRVFDSHRQCDKIILVIWEQEREHFQQL 64 

Query: 73 VERFNLTKVKAWAGGETGQMS I FKGLEAAEQLATDDAVVLIHDGVRPLINEEVINANIQ 132 

+ + +VAGG+ Q S++KGL+A +Q + +VL+HDG RP I E 1+ I 
Sbjct: 65 LSDYPFQTS IELVAGGDERQHSVYKGLKAVKQ EKIVLVHDGARPFIKHEQIDELIA 120 

Query: 133 SVKETGSAOTSVRAKETVVLVNDSSKISEVVDRTRSFIAKAPQSFYLSDILSVERDAISK 192 

++TG+A+ +V K+T+ V D ++SE ++R+ + + PQ+F LS ++ +A K 
Sbjct: 121 EAEQTGAAILAVPVKDTIKRVQDL-QVSETIERSSLWAVQTPQAFRLSLLMKAHAEAERK 179 

15 Query: 193 GITDAIDSSTLMGMYNRELTIVEGPYENIKITTPDDFYMFKALYDARENEQ 243 

G D+S + M + +VEG Y NIK+TTPDD +A+ ++ + 

Sbjct: 180 GFLGTDDASLVEQMEGGSVRWEGSYTNIKLTTPDDLTSAEAIMESESGNK 230 

No corresponding DNA sequence was identified in S.pyogenes. 

20 SEQ ID 3770 (GBS647) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 130 (lane 9 & 10; MW 55.9kDa + lane 8; MW 27kDa) and in Figure 186 (lane 5; 
MW 56kDa).. It was also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 130 (lane 12; MW 31kDa), in in Figure 140 (lane 9; MW 31kDa) and in Figure 
178(lane6;MW31kDa). 

25 Purified GBS647-GST is shown in Figure 243, lane 4; purified GBS647-His is shown in Fig.229, lane 6. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1215 

A DNA sequence (GBSxl291) was identified in S.agalactiae <SEQ ID 3771> which encodes the amino 
30 acid sequence <SEQ ID 3772>. This protein is predicted to be LicDl. Analysis of this protein sequence 
reveals the following: 

Possible site: 41 

»> Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0 . 2647 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 A related GBS nucleic acid sequence <SEQ ID 9989> which encodes amino acid sequence <SEQ ID 9990> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



45 



>GP:AAD37094 GB:AF106539 LicD2 [Streptococcus pneumoniae] 
Identities = 85/271 (31%) , Positives = 130/271 (47%) , Gaps = 15/271 (5%) 

Query: 1 MKEMWSEIREVQLEMLAYIDKVARDNKIEYSLGGGSLLGAMRHKGFIPWDDDIDLMLER 60 

M+ + EI+E+QL +L YID+ + + I Y L G++LGA+RHKG IPWDDDID+ L R 
Sbjct: 1 MQYLEKKEIKEIQIALLDYIDETCKKHDIPYFLSYGTMLGAIRHKGMIPWDDDIDISLYR 60 



50 Query: 61 SQYERLMKALADANNSDFKLLHHSVEKNLW PFAKLYHTKSMYLSKTDRIHPWTGIFI 117 

YERL+K + + N+ +K+L S + + W FA + T ++ T +FI 

Sbjct: 61 EDYERLLKIIEEENHPRYKVL--SYDTSSWYFHNFASILDTSTVIEDHVKYKRHDTSLFI 118 
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Query: 118 DIFPLDRLPESAEERQRFFKKVHSAAANLMCTTYPNFASGSRKLYANARLILGLP-RFIA 176 

D+FP+DR + +++ +AL G KL RL RF+ 

Sbjct: 119 DVFPIDRFTDLSIVDKSY KYVALRQIiAYIKKSRAVHGDSKLKDFLRLCSWYALRFVN 175 

5 Query: 177 YHGQAKKRAE I VDQVMETYNNQEVPYMGYTD - SRYRLKEYFPRE I FSEYEDVMFENI KTR 235 

KK +DQ+++ Y G + +KE FP + F E FE 

Sbjct: 176 PRYFYKK IDQLVKNAVTNTPQYEGGVGIGKEGMKEIFPVDTFKELILTEFEGRMLP 231 

Query: 236 KIKNEHAYLNQLYGGSYMELPPESKRESHSY 266 
10 K +L Q+Y G YM P + +E +S+ 

Sbjct: 232 VPKKYDQFLTQMY-GDYMTPPSKEMQEWYSH 261 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
15 vaccines or diagnostics. 

Example 1216 

A DNA sequence (GBSxl292) was identified in S.agalactiae <SEQ ID 3773> which encodes the amino 
acid sequence <SEQ ID 3774>. Analysis of this protein sequence reveals the following: 

Possible site: 18 
20 »> May be a lipoprotein 

INTEGRAL Likelihood =-12.05 Transmembrane 554 - 570 ( 547 - 575) 

Final Results 

bacterial membrane Certainty=0 . 5819 (Affirmative) < suco 

25 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

30 SEQ ID 3774 (GBS182d) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 184 (lane 8; MW 62kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1217 

35 A DNA sequence (GBSxl293) was identified in S.agalactiae <SEQ ID 3775> which encodes the amino 
acid sequence <SEQ ID 3776>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>>> Seems to have no N-terminal signal sequence 



40 Final Results 

bacterial cytoplasm Certainty=0. 4653 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has no significant homology with any sequences in the GENPEPT database. 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1218 

A DNA sequence (GBSxl294) was identified, in S.agalactiae <SEQ ID 3777> which encodes the amino 
acid sequence <SEQ ID 3778>. This protein is predicted to be DOLICHYL-PHOSPHATE MANNOSE 
SYNTHASE RELATED PROTEIN. Analysis of this protein sequence reveals the following: 

5 Possible site: 29 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -2.92 Transmembrane 232 - 248 ( 231 - 248) 

Final Results 

10 bacterial membrane Certainty=0 .2168 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9991> which encodes amino acid sequence <SEQ ID 9992> 
15 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC35924 GB:AF071085 putative glycosyl transferase [Enterococcus 
faecalis] 

Identities = 118/240 (49%), Positives = 152/240 (63%), Gaps = 1/240 (0%) 

20 

Query: 14 KILLVIPAYNEEGSIAKTVQTIVDFKASRS-LPFELDYIVINDGSTDGTPELLDRLGLNH 72 

K+LL+ I PAYNEE +1 +T+ +1 FK + ELDY+VINDGSTDGT ++L+ +N 
Sbjct: 2 KVLLIIPAYNEEENILRTIASIETFKQEVTHFQHELDYWINDGSTDGTKQILEVNQINA 61 

25 Query: 73 IDLVQNLGIGGCVQTGYLYANRNHYDVAVQFDGrXSQHDIRSIEDVVMPIIjNDEADFVIGS 132 

I LV NLGIGG VQTGY YA N YDVA QFDGDG HDI S+ ++ P+ F GS 

Sbjct: 62 IHLVLNLGIGGAVQTGYKYALENEYDVAXQFDGDGXHDIXSLPILLEPLAEGXCXFSXGS 121 

Query: 133 RFvDKKHQNFQSTAMRRLGINLISAAIKLTTGHKVYDTTSGYRAANAALIAYLSCHYPVQ 192 
30 RF+ +FQS MRR GI L+S G +Y T G RA N +IA+ + YP 

Sbjct: 122 RFIPGNXASFQSXKMRRXGIRLLSFCXXXAXGXTIYXVTXGXRAGNRKVIAFFAKRYPTN 181 

Query: 193 YPEPESTARILKKGYRLKE VTAHMFEREAGTSSISSLKSI FYMTDVLTS 1 1 IAGFIKEDD 252 
YPEPES ++KK + + E NM ER G SSI +L S+ YM +V ++I + IA F+KE D 
35 Sbjct: 182 YPEPESIVHLIKKRFVIVERPVNMMERLGGVSSIRALASVKYMLEVGSAILIAPFMKEGD 241 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3779> which encodes the amino acid 
sequence <SEQ ID 3780>. Analysis of this protein sequence reveals the following: 

Possible site: 56 
40 >» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.80 Transmembrane 211 - 227 ( 211 - 227) 

Final Results 

bacterial membrane Certainty=0. 13 19 (Affirmative) < suco 

45 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC35924 GB:AF071085 putative glycosyl transferase [Enterococcus 
50 faecalis] 

Identities = 104/233 (44%) , Positives = 134/233 (56%) , Gaps = 9/233 (3%) 

Query: 1 VKKLI I IPAYNESSNIVNTIRTIESDAPD FDYIIIDDCSTDNTLAICQKQGFN 53 

+K L+ I I PAYNE NI+ TI +IE+ + DY++I+D STD T I + N 

55 Sbjct: 1 MKVLLIIPAYNEEENILRTIASIETFKQEVTHFQHELDYWINDGSTDGTKQILEVNQIN 60 

Query: 54 VISLPINLGIGGAVQTGYRYAQRCGYDVAVQVDGrX^HNPCYLEKMVEVTjVQSSVNMVIG 113 

I L +NLGIGGAVQTGY+YA YDVA Q DGDG H+ L ++E L + G 
Sbjct: 61 AIHLVLNLGIGGAVQTGYKYALENEYDVAXQFDGDGXHDIXSLPILLEPLAEGXCXFSXG 120 
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Query: 114 SRFI--TKEGFQSSFARRIGIKYFTWLIALLTGKKITDATSGLRLIDRSLIERFANHYPD 171 

SRFI FQS RR GI+ ++ G I T G R +R +1 FA YP 

Sbjct: 121 SRFIPGNXASFQSXKMRRXGIRLLSFCXXXAXGXTIYXVTXGXRAGNRKVIAFFAKRYPT 180 

Query: 172 DYPEPETWDVLVSHFKVKEIPVVMlffiROGGVSSISLTKSVYYMIKVTLAILV 224 

+YPEPE++V ++• F + E PV M ER GGVSSI SV YM++V AIL+ 
Sbjct: 181 NYPEPESIVHLIKKRFVIvERPVNMMERLGGVSSIRALASVTCiMljEVGSAILI 233 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 105/231 (45%) , Positives = 142/231 (61%) , Gaps = 8/231 (3%) 

Query: 14 KILLVIPAYNEEGSIAKTVQTIVDFKASRSLPFELDYIVINDGSTDGTPELLDRLGIiNHI 73 

K L++IPAYNE +1 T++TI S + DYI+I+D STD T + + G N I 

Sbjct: 2 KKLIIIPAYNESSNIVNTIRTI ESDAPDFDYI I IDDCSTDNTLAI CQKQGFNVT 55 

Query: 74 DLVQNLGIGGCVQTGYLYANR1TOYDVAVQFDGDGQHDIRSIEDVVMPILNDEADFVIGSR 133 

L NLGIGG VQTGY YA R YDVAVQ DGDGQH+ +E +V ++ + VIGSR 
Sbjct: 56 SLPINLGIGGAVQTGYRYAQRCGYDVAVQVDGDGQHNPCYLEKMVEVLVQSSVNMVIGSR 115 

Query: 134 FVDKKHQNFQSTAMRRLGINLISAAIKLTTGHKVYDTTSGYRAANAALIAYLSCHYPVQY 193 

F+ K + FQS+ RR+GI + I L TG K+ D TSG R + +LI + HYP Y 
Sbjct: 116 FITK--EGFQSSFARRIGIKYFTWLIALLTGKKITDATSGLRLIDRSLIERFANHYPDDY 173 

Query: 194 PEPESTARILKKGYRLKEVTANMFERFAGTSSISSLKSIFYMTDVLTSIII 244 

PEPE+ +L +++KE+ M ER+ G SSIS KS++YM V 
Sbjct: 174 PEPETVVDvLVSHFKVKEIPvVMNERQGGVSSISLTKSVYYMIKVTLAILV 224 

A related GBS gene <SEQ ID 875 1> and protein <SEQ ID 8752> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
McG: Discrim Score: 0.29 
GvH: Signal Score (-7.5): -4.34 

Possible site: 29 
>>> Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -2.92 threshold: 0.0 

INTEGRAL Likelihood = -2.92 Transmembrane 222- 238 ( 221 - 238) 
PERIPHERAL Likelihood =4.40 4 
modified ALOM score: 1.08 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 2168 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF00548 (340 - 1056 of 1359) 

GP|3608398|gb|AAC35924.l| |AF071085(2 - 241 of 241) putative glycosyl transferase 
{Enterococcus faecalis} 
%Match =24.7 

%Identity =49.2 %Similarity =64.2 ■ 
Matches = 118 Mismatches = 85 Conservative Sub.s = 36 

249 279 309 339 369 399 429 456 

L*QD*GGYGNMVIAKINLSIKLCLNG*XQQI IXIRDKMMKKILLVIPAYNEEGSIAKTVQTIvDFKASRS-LPFELDYIV 

hIMIIIIII :| =|: =111 = = MM 
MKvLLIIPAYNEEENILRTIASIETFKQEVTHFQHELDYW 
10 20 30 40 



486 516 546 576 606 636 666 696 

INDGSTDGTPELLDRLGLNHIDLVQNLGIGGCVQTGYLYANRNHYDVAVQFDGDGQHDIRSIEDVWPILNDEADFVIGS 
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INDGSTDGTKQILEVNQINAIHLVIjNLGIGGAVQTGYKYALENEYDVAXQFDGDG 

60 70 80 90 100 110 120 

726 756 786 816 846 876 906 936 

5 RFVDKKHQNFQSTMRRLGINLISAAIKLTTGHKVYDTTSGYRAANAALIAYLSCHYPVQYPEPESTARILKKGYRLKEV 

11= =111 III II 1=1 I =1 I I II I :||:=: II llllll ::|| : : I 

RFIPGNXASFQSXKMRRXGIRLLSFCXXXAXGXTIYXVTXGXRAGNRKVJA^ 

140 150 160 170 180 190 200 

10 966 996 1026 1056 1086 1116 1146 1176 

TANMFEREAGTSSISSLKSIFYMTDVLTSIIIAGFIKEDDK*V*HCK1KCLF*PLSYFI*L*EWLIKTHFLLNVIjYLGY* 

II II I III =1 1= II :| ::|:|| 1=11 I 
PVNMMERLGGVSSIRALASVKYMLEVGSAILIAPFMKEGD 
220 230 240 

15 

SEQ ID 8752 (GBS355) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 74 (lane 4; MW 27kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 81 (lane 7; MW 52kDa). 

GBS355-GST was purified as shown in Figure 213 (lane 4) and in Figure 216 (lane 6). 

20 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1219 

A DNA sequence (GBSxl295) was identified in S.agalactiae <SEQ ID 3781> which encodes the amino 

acid sequence <SEQ ID 3782>. Analysis of this protein sequence reveals the following: 

25 Possible site: 19 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.91 Transmembrane 185 - 201 ( 185 - 201) 

Final Results 

30 bacterial membrane Certainty=0 . 1765 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

35 >GP:BAA32090 GB:AB010970 rhamnosyltransf erase [Streptococcus mutans] 

Identities = 181/315 (57%) , Positives = 244/315 (77%) , Gaps = 7/315 (2%) 

MKVNILMATYNGEKFLAQQIESIQKQTFKEWNLLIRDDGSSDKTCDIIRNFTAKDSRIRF 60 
MKVNILM+TYNG++F+AQQI +S IQKQTF+ WNLLIRDDGSSD T II +F D+RIRF 
40 Sbjct: 1 MKVNILMSTYNGQEFIAQQIQSIQKQTFENWNLLIRDDGSSDGTPKI IADFAKSDARIRF 60 

INENEHHNLGVIKSFFTLVNYEVADFYFFSDQDDVWLPEKLSVSLEftAKHKASDVPLLVY 12 0 
IN ++ N GVIK+F+TL.+ YE AD+YFFSDQDDVWLP+KL ++L + + + + +PL+VY 



45 



50 



Query: 


1 


Sb j ct : 


1 


Query: 


61 


Sb j ct : 


61 


Query: 


121 


Sb j ct : 


121 


Query: 


180 


Sb j ct : 


181 


Query: 


239 


Sb j ct : 


239 



TDL W+++L +L DSMI+ QSHHANT+LL ELTENTVTGGTMM+NH LA++W +D+ 



+MHDW+LALIAASLG++IYLD T+LYRQH++NVLGART KR K LR P + +YW 



55 L+ SQ+QAS +++ D+ AN +1+ ++ + Q F+ R++WL +YG++KN+ H 



Query: 299 WFKWLIATNYYNKR 313 
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VFK LI T + +R 
Sbjct: 296 FVFKTLIITKFGYRR 310 

A related DNA sequence was identified in S.pyogenes <SEQ ID 817> which encodes the amino acid 
5 sequence <SEQ ID 818>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>>> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 . 1980 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

15 Identities = 178/314 (56%) , Positives = 232/314 (73%) , Gaps = 6/314 (1%) 

Query: 1 MKVNILMATYNGEKFLAQQIESIQKQTFKEWNLLIRDDGSSDKTCDIIRNFTAKDSRIRF 60 

M +NIL++TYNGE+FLA+QI+SIQ+QT +W LLIRDDGS+D T DIIR F +D RI++ 
Sbjct: 1 MNINILLSTYNGERFLAEQIQSIQRQTVNDOTLLIRDDGSTDGTQDIIRTFVKEDKRIQW 60 



20 

Query: 61 INENEHHM^GVIKSFFTLVNYEVADFYFFSDQDDVWLPEKLSVS-LEAAKHKASDVPLLV 119 

INE + NLGVIK+F+TL+ ++ AD YFFSDQDD+WL KL V+ LEA KH+ + PLLV 
Sbjct: 61 INEGQTENLGVIKNFYTLLKHQKADVYFFSDQDDIWLDNKLEVTLLEAQKHEMT-APLLV 119 

25 Query: 120 YTDLKOTNQELNILQDSMIRAQSHHANTTLLPELTENTOTGGT^INHALAEKWFTPNDI 179 

YTDLKOT Q L + DSMI+ QS HANT+LL ELTENTVTGGTMMI HALAE+W T + + 
Sbjct: 120 YTDLKVWQHLAVCHDSMIKTQSGHANTSLLQELTENTOTGGTNIMITHALAEEWTTCDGL 179 

Query: 180 LMHDWFLALLAASLGEI IYLDLPTQLYRQHDNNVLGARTMDKRFKILREGPKSI FTRYWK 239 
30 LMHDW+LALLA+++G+++YLD+PT+LYRQHD NVLGART KR K P + +YW 

Sbjct: 180 LMHDWYIiALIiASAIGKLVYLDIPTELYRQHDANVIXSARTWSKRMKNWLT-PHHLvNKYWW 238 

Query: 240 LIHDSQKQASLIVDKYGDIMTANDLELIKCFIKIDKQPFMTRLRWLWKYGYSKNQFKHQV 299 
LI SQKQA L++D + ND EL+ ++ + PF RL L +YG+ KN+ H 

35 Sbjct: 239 LITSSQKQAQLLLDL PLKPNDHELVTAYVSLLDMPFTKRLATLKRYGFRKNRIFHTF 295 

Query: 300 VFKWL IATNYYNKR 313 

+F+ L+ T + +R 
Sbjct: 296 IFRSLWTLFGYRR 309 

40 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1220 

A DNA sequence (GBSxl296) was identified in S.agalactiae <SEQ ID 3783> which encodes the amino 
45 acid sequence <SEQ ID 3784>. This protein is predicted to be rgpAc. Analysis of this protein sequence 
reveals the following: 

Possible site: 21 

>>> Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0. 1881 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

55 A related GBS nucleic acid sequence <SEQ ID 9993> which encodes amino acid sequence <SEQ ID 9994> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:BAA32089 GB:AB010970 rgpAc [Streptococcus mutans] 
Identities = 234/362 (64%) , Positives = 284/362 (77%) 

Query: 33 VSELINHQKSFDIKYHVACLSDKEHHTHFNFADADCFTINPPQLGPARVIAYDIMAINYA 92 
5 + EL+ +++S + YHVACLS+ + H HF + DCFTI P+LGPARVTAYD+MAI YA 

Sbjct: 1 MEELVKYKQSQQLTYHVACLSETDQHKHFTYLGVDCFTIKAPKLGPARVIAYDMMAIRYA 60 

Query: 93 LDLVKTHDLKEPIFYILGNTIGAFIWHFANKIHKVGGLLYVNPDGLEWKRSKWSRPTQRY 152 
L L+K +K PIFYILGNTIGAF+ FA KI ++GG Y+NPDGLEW+RSKWSRP Q Y 
10 Sbjct: 61 LKLIKDQKIKHPIFYILGNTIGAFMGPFARKIKRIGGRFYINPDGLEWRRSKWSRPVQAY 120 

Query: 153 LKYAEKCMTKNADL1ISDNIGIENYIQSTYSNVKTRFIAYGTEINSRKLSSDDPRVKQLF 212 

LKYAEKCMTK ADL+ISDN GIE YI+ Y KT FIAYGT+++ L +D +VK + 
Sbjct: 121 LKYAEKCMTKKADLVISDNTGIEGYIKQMYPWAKTTFIAYGTDLSPSGLLKM3SKVKDFY 180 

15 

Query: 213 KKWNIKSKGYYLIVGRFVPENOTETAIREF^SDTKRDLVIICMQNNPYFEKLSLKTNL 272 

KKW IK KGYYLIVGRFVPENNYETAIREFM S ++RDLVIICN++ N YFE L KT 
Sbjct: 181 KKWAIKEKGYYLIVGRFVPENNYETAIREFMTSSSERDLVIICNYEGNAYFEDbRQKTEF 240 

20 Query: 273 QQDKRVKFVGTLYEKDLLDYWQQAFAYIHGHEVGGTNPGLLEALANTDLNLVLDVDFNK 332 

+DKR+KFVGT+Y++ LL Y+R+QAFAYIHGHEVGGTNPGLLEALA+TDLNLVL +FN 
Sbjct: 241 DKDKRIKFVGTVYDRPLLTYIREQAFAYIHGHEVGGTNPGLLEALAHTDLNLVLITEFNY 300 

Query: 333 SVAGLSSFYWAKKEGDLAKLINDSDQQQDLSTYGDRAKAIIQENYTWKKIVEEYEDLFIiN 392 
25 +VA ++ YW + G LA+LIN D+Q++ + YG RAK II YTW+KIVEEYEDLFL+ 

Sbjct: 301 TVALDAARYWTQDNGSLAQLINQFDKQENFAEYGQRAKEIIVNYYTWEKIVEEYEDLFLH 360 

Query: 393 ES 394 
ES 

30 Sbjct: 361 ES 362 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3785> which encodes the amino acid 

sequence <SEQ ID 3786>. Analysis of this protein sequence reveals the following: 

Possible site: 23 
35 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.38 Transmembrane 95 - 111 ( 95 - 111) 

Final Results 

bacterial membrane — Certainty=0 . 1553 (Affirmative) < suco 

40 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 250/383 (65%) , Positives = 307/383 (79%) 

45 

Query: 11 MQDVFIIGSRGLPARYGGFETFVSELINHQKSFDIKYHVACLSDKEHHTHFNFADADCFT 70 

MQD VFI IGSRGLPA+YGGFETFV ELI+HQ S + I + YHVACLSD +H HF++ ADCF 
Sbjct: 1 MQDVFIIGSRGLPAKYGGFETFVEELISHQSSKNIRYHVACLSDTKHKVHFDYKGADCFY 60 

50 Query: 71 INPPQLGPARVIAYDIMAINYALDLVKTHDLKEPIFYILGNTIGAFIWHFANKIHKVGGL 130 

+NPP+LGPARVIAYD+MAI YAL H ++ PIFY+LGNT+GAFI F +IH GG 

Sbjct: 61 LNPPKLGPARVIAYDM^AITYALSYSDQHQIQNPIFYVLGNTVGAFIAPFVKQIHNRGGR 120 

Query: 131 LYWPDGLEWKRSKWSRPTQRYLKYAEKOTTKNADLIISDNIGIENYIQSTYSNVKTRFI 190 
55 ++NPDGLEWKRSKWSRP Q YLK++EK MT+ ADL+ISDNIGI+ Y++ Y KT FI 

Sbjct: 121 FFINPDGLEWKRSKWSRPVQAYLKFSEKQMTRQADLVISDNIGIDRYLKQvYPWSKTCFI 180 

Query: 191 AYGTEINSRKLSSDDPRWQLFKKWNIICSKGYYLIVGRFVPENNYETAIREFMASDTKRD 250 
AYGT+ +L++ D +V+ F+ ++I+ K YYLI+GRFVPENNYETAI+EFMAS TKRD 
60 Sbjct: 181 AYGTQTQPSRLATADSKVRAYFQTFDIREKDYYLILGRFVPENNYETAIKEFMASSTKRD 240 

Query: 251 LVIICNHQNNPYFEKLSLKTNLQQDKRVTCFVGTLYEKDLLDYVRQQAFAYIHGHEVGGTN 310 

LVIICNH+ N YF++L +T +D R+KFVGTLY+K+LL Y+R+QA+AYIHGHEVGGTN 
Sbjct: 241 LVIICNHEGNAYFKQLLAETECDKDPRIKFVGTLYDKELLAYIREQAYAYIHGHEVGGTN 300 



65 
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Query: 311 PGLLEAIJUOTDI^VLDVDFNKSVAGLSSFYWAKJCEGDIiAKLINDSDQQQDLSTYGDRAK 370 

PGLLEAIA+T+LNLVL VDFN+SVA ++ YW K++G LA+LIN D D G AK 
Sbjct: 301 PGLLiAIAHTNjmiVLGVDFNQSVAKSAALYWTKQKGQLAELINQVDAGFDSDHLGKEAK 360 

Query: 371 AI IQENYTWKKIVEEYEDLFLNE 393 

AIIQE+YTW+KIV EYE LFIiNE 
Sbjct: 361 AIIQEHYTWEKIVGEYEALFLNE 383 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1221 

A DNA sequence (GBSxl297) was identified in S.agalactiae <SEQ ID 3787> which encodes the amino 
acid sequence <SEQ ID 3788>. This protein is predicted to be dTDP-L-rhamnose synthase. Analysis of this 
protein sequence reveals the following: 

Possible site: 61 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1059 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD10184 GB:AF026471 Cps20 [Streptococcus pneumoniae] 
Identities = 258/283 (91%) , Positives = 274/283 (96%) 



Query: 


1 


MILITGANGQLGSELRHLLDERTQEYVAvDVAEMDITNREMVDKvlTlEVKPSLVYHCAAY 


60 






MILITGANGQLG+ELR+LLDER +EYVAVDVAEMDIT+AEMV+KVFEEVKP+LVYHCAAY 




Sb j ct : 


1 


MILITGANGQLGTELRYLLDERNEEYVAVDVAEMDITDAEMVEKVFEEVKPTLVYHCAAY 


60 


Query: 


61 


TAVDAAEDEGKELDFAINVTGTENVAKAftAKHDATLVYISTDYVFDGEKPVGQEWEVDDL 


120 






TAVDAAEDEGKELDFAINVTGT+NVAKA+ KH ATLVYISTDYVFDG+KPVGQEWEVDD 




Sb j ct : 


61 


TAVDAAEDEGKELDFAINVTGTKNVAKASEKHGATLVYISTDYVFDGKKPVGQEWEVDDR 


120 


Query: 


121 


PDPKTEYGRTKRMGEELVEKYTSKFYTIRTAWVFGNYGKNFVFTMQNLAKTHKTLTVVND 


180 






PDP+TEYGRTKRMGEELVEK+ S FY IRTAWVFGNYGKNFVFTMQNLAKTHKTLTWND 




Sb j ct : 


121 


PDPQTEYGRTKI^GEELVEKIWSNFYIIRTAWFG^reGKNEVFTMQ^^^KTHKTLTVVlTO 


180 


Query: 


181 


QHGRPTOTRTLAEFMTYLAFJ^QKDFGYYHLSNDAKEDTTWYDFAVEILKDTDVEVKPVDS 


240 






Q+GRPTWTRTLAEFMTYIAEN+ K+ FGYYHLSNDA EDTTWYDFAVE I LKDTD VEVKPVDS 




Sb j ct : 


181 


QYGRPTWTRTIAEFMTYLAENRKEFGYYHLSNDATEDTTWYDFAVEILKDTDVEVKPVDS 


240 


Query: 


241 


SQFPAKAKRPLNSTMSLEKAKATGFVI PTWQDALKEFYKQEVK 283 








SQFPAKAKRPLNSTMSL KAKATGFVI PTWQDAL+EFYKQEV+ 




Sb j ct : 


241 


SQFPAKAKRPLNSTMSLAKAKATGFVIPTWQDALQEFYKQEVR 283 





A related DNA sequence was identified in S.pyogenes <SEQ ID 3789> which encodes the amino acid 
sequence <SEQ ID 3790>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 06 18 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 227/284 (79%) , Positives = 248/284 (86%) 
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Query: 1 MILITGANGQLGSELRHLLDERTQEYVAVDVAEMDITNAEMVDKVFEEVKPSLVYHCAAY 60 

MILITG+NGQLG+ELR+LLDER +YVAVDVAEMDITN + V+ VF +VKP+LVYHCAAY 
Sbjct: 21 MILITGSNGQLGTELRYLLDERGVDWAVDVAEMDITNEDKVEAVFAQVKPTLVYHCAAY 80 

5 

Query: 61 TAVDAAEDEGKELDFAINVTGTENVAKAAAKHDATLVYISTDYVFDGEKPVGQEWEVDDL 120 

TAVDAAEDEGK L+ AINVTG+EN+AKA K+ ATLVYISTDYVFDG KPVGQEW D 
Sbjct: 81 TAVDAAEDEGKAIJSIEAINVTGSENIAKACGKYGATLVYISTDYVFDGNKPVGQEWVETDH 140 

10 Query: 121 PDPKTEYGRTKRMGEELVEKYTSKFYTIRTAWFGNYGKNFVFTMQNLAKTHKTLTVVND 180 

PDPKTEYGRTKR+GE VE+Y FY IRTAWVFGNYGKNFVFTM+ LA+ H LTWND 
Sbjct: 141 PDPKTEYGRTKRLGEIAVERYAEHFYIIRTAWWGNYGKNFVFTMEQIAENHSRLTVVND 200 

Query: 181 QHGRPTWTRTIAEFMTYLAENQKDFGYYHLSNDAKEDTTWYDFAVEILKDTDVEVKPVDS 240 
15 QHGRPTWTRTLAEFM YL ENQK FGYYHLSNDAKEDTTWYDFA EILKD VEV PVDS 

Sbjct: 201 QHGRPTVWRTLAEFMCYLTENQKAFGYYHLSNDAKEDTTWYDFAKE1LKDKAVEWPVDS 260 

Query: 241 SQFPAKAKRPIiNSTMSLEKAKATGFVIPTWQDALKEFYKQEVKK 284 
S FPAKAKRPLNSTM+L+KAKATGFVI PTWQ+ALK FY+Q +KK 
20 Sbjct: 261 SAFPAKAKRPLNSTMNLDKAKATGFVIPTWQEALKAFYQQGLKK 304 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1222 

25 A DNA sequence (GBSxl298) was identified in S.agalactiae <SEQ ID 379 1> which encodes the amino 
acid sequence <SEQ ID 3792>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»■> Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certainty=0. 2554 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA21508 GB:AB000631 unnamed protein product [Streptococcus mutans] 
Identities = 92/108 (85%) , Positives = 100/108 (92%) 

Query: 5 KQYSEEEVGKIKDRIIJEALE^WIDPELGIDIVNLGLIYEIRFEDNGRTEIDMTLTTMGCP 64 
40 K Y+ EE+ KIKDRILEALEMVIDPELGIDIVNLGLIY+IRFED+GRTEIDMTLTTMGCP 

Sbjct: 4 KNYTPEEIAKIKDRILEALEMVIDPELGIDIVNLGLIYDIRFEDSGRTEIDMTLTTMGCP 63 

Query: 65 LADLLTDQIHD\MKTVPEVTETEVKLVWYPAWSVDKMSRYARIALGIR 112 
LADLLTDQIHD +K VPEV + +VKLVW PAW+VDKMSRYARIALGIR 
45 ' Sbjct: 64 IADLLTDQIHDALKDVPEVLDIDVKLVWSPAWTVDKMSRYARIALGIR 111 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3793> which encodes the amino acid 
sequence <SEQ ID 3794>. Analysis of this protein sequence reveals the following: 

Possible site: 32 
50 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2818 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

55 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 90/112 (80%) , Positives = 102/112 (90%) 
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Query: 1 MSEVKQYSEEEVGKIKDRILEALEMVIDPELGIDIVNLGLIYEIRFEDNGRTEIDMTLTT 60 

MS+ +Y++++V IK+RILEALE VIDPELGID+VNLGLIYEIRF DNG TEIDMTLTT 
Sbjct: 1 MSDTPKYTQDQVIAIKNRILEALETVIDPELGIDWNLGLIYEIRFNDNGYTEIDMTLTT 60 

5 Query: 61 MGCPLADLLTDQIHDVMKTVPEVTETEVKLVWYPAWSVDKMSRYARIALGIR 112 

MGCPLADLLTD IHD ++ VPEVT+TEVKLVWYPAW+VDKMSRYARIALGIR 
Sbjct: 61 MGCPIADLLTDYIHDALQDVPEVTKTEVKLVWYPAWTVDKMSRYARIALGIR 112 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
10 vaccines or diagnostics. 

Example 1223 

A DNA sequence (GBSxl299) was identified in S.agalactiae <SEQ ID 3795> which encodes the amino 
acid sequence <SEQ ID 3796>. This protein is predicted to be RNA polymerase sigma factor, sigma-70 
family (rpoD). Analysis of this protein sequence reveals the following: 

15 Possible site: 54 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3157 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein is similar to the sigma-42 protein from S.mutans: 

>GP:BAA21507 GB.-AB000631 sigma 42 protein [Streptococcus mutans] 
25 Identities = 345/367 (94%) , Positives = 358/367 (97%) 

Query: 14 EKKGNTTFNVQVADFIRNHKKQGTAIDDEVTEKLVIPFVLDADQIDDLLERLTDGGISIT 73 

+KK ++TFNVQVADFIRNHKK+G A+DDEVTEKLVIPF L+A+QIDDLLERLTDGGISIT 
Sbjct: 5 KKKrSSTFNVQVADFIRNHKKEGVAVDDEVTEKLVIPFELEAEQIDDLLERLTDGGISIT 64 

30 

Query: 74 DKEGNPSTKYVVEGPKPEELTDEELIGSNSAKVNDPVRMYLKEIGWPLLTNEEEKELAV 133 

D+EGNPSTKY VE KPEELTDEEL+GSNSAKVNDPVRMYLKEIGWPLLTNEEEKELA+ 
Sbjct: 65 DREGNPSTKYAVEEIKPEELTDEELLGSNSAKVNDPVRMYLKEIGWPLLTNEEEKELAI 124 

35 Query: 134 AVAEGDLMAKQRLAEANLRLWSIAKRYVGRGMQFLDLIQEGNMGLMKAVDKFDYSKGFK 193 

AV GDL AKQRLAEANLRLWSIAKRYVGRGMQFLDLIQEGNMGLMKAVDKFDYSKGFK 
Sbjct: 125 AVENGDLEAKQRLAEANLRLVVSIAKRYVGRGMQFLDLIQEGNMGLMKAVDKFDYSKGFK 184 

Query: 194 FSTYATWWIRQAITRAIADQARTIRIPVHMVETINKBVREQRNLLQELGQDPTPEQIAER 253 
40 FSTYATWWIRQAITRAIADQARTIRIPVHMVETINKLVREQRNLLQELGQDPTPEQIAER 

Sbjct: 185 FSTYATWWIRQAITRAIADQARTIRIPVHMVETINKLVREQRNLLQELGQDPTPEQIAER 244 

Query: 254 MDMTPDKVREILKIAQEPVSLETPIGEEDDSHLGDFIEDEVIENPVDYTTRWLREQLDE 313 
MDMTPDKVRE I LKIAQEPVSLETP IGEEDDSHLGDFI EDEVI ENPVDYTTRWLREQLDE 
45 Sbjct: 245 MDMTPDKVREILKIAQEPVSLETPIGEEDDSHLGDFIEDEVIENPVDYTTRWLREQLDE 304 

Query: 314 VLDTLTDREENVLRLRFGLDDGKl^TLEDVGKVFNVTRERIRQIEAKALRKLRHPSRSKQ 373 

VLDTLTDREENVLRLRFGLDDGKMRTLEDVGKVF+VTRERIRQIEAKALRKLRHPSRSKQ 
Sbjct: 305 VTjDTLTDREENVLRLRFGLDDGKMRTLEDVGKVFDVTRERIRQIEAKALRKLRHPSRSKQ 364 



50 



Query: 374 LKDFMED 380 

L+DF+ED 
Sbjct: 365 LRDFVED 371 



55 A related DNA sequence was identified in S.pyogenes <SEQ ID 3797> which encodes the amino acid 
sequence <SEQ ID 3798>. Analysis of this protein sequence reveals the following: 



Possible site: 43 

»> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 . 1788 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 351/369 (95%), Positives = 364/369 (98%) 

Query: 12 MAEKKGNTTFWQVADFIRNHKKQGTAIDDEVTEKLVIPFVLDADQIDDLLERLTDGGIS 71 
10 M ++K TTFNVQVA+FIR+HKK+GTAIDD+VTEKLVIPF LDADQIDDLLERLTDGGIS 

Sbjct: 1 MTKQKEITTFNVQVAEFIRHHKKEGTAIDDD^f^EKLVIPFALDADQIDDLLERLTDGGIS 60 

Query: 72 ITDKEGNPSTKYVVEGPKPEELTDEELIGSNSAKVNDPVRMYLKEIGWPLLTNEEEKEL 131 
ITDKEGNPS+KY+VE PKPEELTDEELIGSNSAKVNDPVRMYLKEIGWPLLT+EEEKEL 
15 Sbjct: 61 ITDKEGNPSSKYIVEEPKPEELTDEELIGSNSAKVNDPVRMYLKEIGWPLLTSEEEKEL 120 

Query: 132 AVAVAEGDLMAKQRLAEANLRLWSIAKRYVGRGMQFLDLIQEGNMGLMKAVDKFDYSKG 191 

AVAVA+GDLMAKQRLAEANLRLWSIAKRYVGRGMQFLDLIQEGNMGLMKAVDKFDYSKG 
Sbjct: 121 AVAVAKGDLMAKQRLAEANLRLWSIAKRYVGRGMQFLDLIQEGNMGLMKAVDKFDYSKG 180 

20 

Query: 192 FKFSTYATWWIRQAITRAIADQARTIRIPVHMVETINKLVREQRNLLQELGQDPTPEQIA 251 

FKFSTYATWWIRQAITRAIADQARTIRIPVHMVETINKLVREQRNLLQELGQDPTPEQIA 
Sbjct: 181 FKFSTYATWWIRQAITRAIADQARTIRIPVHMVETINKLVREQRNLLQELGQDPTPEQIA 240 

25 Query: 252 ERMDMTPDKVREILKIAQEPVSLETPIGEEDDSHLGDFIEDEVIENPVDYTTRWLREQL 311 

ERM+MTPDKVREILKIAQEPVSLETPIGEEDDSHLGDFIEDEVIENPVDYTTRWLREQL 
Sbjct: 241 ERMEMTPDKVREILKIAQEPVSLETPIGEEDDSHLGDFIEDEVIENPVDYTTRWLREQL 300 

Query: 312 DEVLDTLTDREENVLRLRFGLDDGKMRTLEDVGKVFNVTRERIRQIEAKALRKLRHPSRS 371 
3 0 DEVLDTLTDREENVLRLRFGLDDGKMRTLEDVGKVFNVTRERIRQIEAKALRKLRHPSRS 

Sbjct: 301 DEVLDTLTDREFJ^RLRFGLDDGK^TLEDVGKVENVTRERIRQIEAKALRKLRHPSRS 360 

Query: 372 KQLKDFMED 380 
KQL+DF+ED 
35 Sbjct: 361 KQLRDFIED 369 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1224 

40 A DNA sequence (GBSxl300) was identified in S.agalactiae <SEQ ID 3799> which encodes the amino 
acid sequence <SEQ ID 3800>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0 .2853 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1225 

A DNA sequence (GBSxl301) was identified in S.agalactiae <SEQ ID 3801> which encodes the amino 
acid sequence <SEQ ID 3802>. Analysis of this protein sequence reveals the following: 

Possible site: 40 
5 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2198 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



15 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA03516 GB:D14690 DNA primase [Lactococcus lactis] 
Identities = 206/398 (51%) , Positives = 294/398 (73%) , Gaps = 6/398 (1%) 

Query: 37 LAIDKEKISEIKNSVNIVDVIGEWGLTKTGRNHLGLCPFHKEKTPSFNVIEDRQFFHCF 96 

+++D E ++++K+ VNI D+I + V L++TG+N++GLCPFH EKTPSFNV ++ F+HCF 
Sbjct: 2 VSLDTEVVNDLKSKVNIADLISQYVALSRTGKNYIGLCPFHGEKTPSFNVNAEKGFYHCF 61 

20 Query: 97 GCGRSGDVFKFVEDYQHISFLDSVQVLAERSGIPLDTNFKGQVPKKPKANQSLliDlHRVA 156 

GCGRSGD +F+++Y + F+D+V+ IA+ +G+ L N +K N L +1+ A 

Sbjct: 62 GCGRSGDAIEFLKEYNQVGFVDAVKELADFAGVTL--NISDDREEKNNPNAPLFEINNQA 119 

Query: 157 SGFYHAYLMTTNDGERARQYLAERGVTEDLIKHFQIGLSPGGQDFLYRRLAKEFDEKTLM 216 
25 + Y+ LM+T GERAR+YL ERG+T+D+IK F IGL+P DF+++ L+ +FDE+ + 

Sbjct: 120 ARLYNILLMSTELGERARKYLEERGITDDVIKRFNIGLAPEENDFIFKNLSNKFDEEIMA 179 

Query: 217 SSGLFNYSENSNQFYDSFNNRIMFPLTND1GEVIAFSGRVWTQEDIDRKQAKYKNSRATP 276 
SGLF++S +N+ +D+F NRIMFP+TN+ G+ I FSGR W QE+ D K AKY N+ AT 
30 Sbjct: 180 KSGLFHFS - -NNKVFDAFTNRIMFPITNEYGQTIGFSGRKW-QENDDSK-AKYINTSATT 235 

Query: 277 IFNKSYELYHLDKARAVINKAHEVYL^GFMDVIAAYRAGIENVVASMGTALTNEHVRHL 336 

IF+KSYEL++LDKA+ I+K HEVYLMEGFMDVTA+Y+AGI NWASMGTALT +HVR h 
Sbjct: 236 IFDKSYELWNLDKAKPTISKQHEVYLMEGFMDVIASYKAGINNWASMGTALTEKHVRRL 295 

35 

Query: 337 KRFTKKWLTYDGDRAGQNAIDKSLELLSDMTVDIVRIPNKMDPDEFLQANSAEDFKQLL 396 

K+ KK VL YDGD AGQNAI K+++L+ + V IV++P +DPDE+ + + L+ 
Sbjct: 296 KQMAKKFVLVYDGDSAGQNAIYKAIDLIGESAVQIVKVPEGLDPDEYSKNYGLKGLSALM 355 

40 Query: 397 ENGRISNTEFYIHYLKPENTDNLQSEIAYVEKIAKLIA 434 

E GRI EF I YL+PEN NLQ+++ ++E+I+ +IA 
Sbjct: 356 ETGRIQPIEFLIDYLRPENLANLQTQLDFIEQISPMIA 393 

A related DNA sequence was identified in S. pyogenes <SEQ ID 3803> which encodes the amino acid 
45 sequence <SEQ ID 3804>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>>> Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0. 3532 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

55 Identities = 378/604 (62%) , Positives = 477/604 (78%) , Gaps = 2/604 (0%) 



Query: 28 MGYFCG<3HDLAIDKEKISEIKNSVNIvDVIGEvVGLTKTGRNHLGLCPFHKEKTPSFNVI 87 

MG+ GG DLAIDKE IS++KNSVNIVDVIGEW L+++GR++LGLCPFHKEKTPSFNV+ 
Sbjct: 1 MGFLWGGDDmiDKEMISQVKNSVNIVDVIGEVVKLSRSGRHYLGLCPFHKEKTPSFNVV 60 

60 

Query: 88 EDRQFFHCFGCGRSGDVFKFVEDYQHISFLDSVQVLAERSGIPLDTNFKGQV- - PKKPKA 145 
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EDRQFFHCFGCG+SGDVFKF+E+Y+ + FL+SVQ++A+++G+ L+ V + 



Sbjct: 


61 


EDRQFFHCFGCGKSGDVFKFIEEYRQVPFI1ESVQIIADKTGMSI1NIPPSQAVLASQHKHP 


120 


Query: 


146 


NQSLLDIHRVASGFYHAYLMTTNDGERARQYLAERGVTEDLIKHFQIGLSPGGQDFLYRR 


205 






N +L+ +H A+ FYHA LMTT G+ AR+YL +RG+ + LI+HF IGL+P D+LY+ 




Sbjct: 


121 


NHALMTLHEDAAKFYHAVLMTTTIGQEARKYLYQRGLDDQLIEHFNIGIAPDESDYLYQA 


180 


Query: 


206 


IAKEFDEKTLMSSGLFNYSENSNQFYDSFNNRIMFPLTNDIGEVIAFSGRVWTQEDIDRK 


265 






L+K+++E L++SGLF+ S+ SN YD+F NRIMFPL++D G +IAFSGR+WT D++++ 




Sbjct: 


181 


LSKKYEEGQLVASGLFHLSDQSNTIYDAFRNRIMFPLSDDRGHIIAFSGRIWTAADMEKR 


240 


Query: 


266 


QAKYKNSRATPIFNKSYELYHLDKARAVINKAHEVYLMEGFMDVIAAYRAGIENWASMG 


325 






QAKYKNSR T +FNKSYELYHLDKAR VI K HEV+LMEGFMDVIAAYR+G EN VASMG 




Sbjct: 


241 


QAKYKNSRGTVLFNKSYELYHLDKARPVIAKTHEVFLMEGFMDVIAAYRSGYEMAVASMG 


300 


Query: 


326 


TALTITOHWHLKRFTKKVVLTYDGDRAGQNAIDKSLELLSDMTVDIVRIPNKMDPDEFLQ 


385 






TALT EHV HLK+ TKKWL YDGD AGQ+AI KSLELL D V+IVRIPNKMDPDEF+Q 




Sbjct: 


301 


TALTQEHVNHLKQVTKKVVLIYDGDDAGQHAIAKSLELLKDFVVEIVRIPNKMDPDEFVQ 


360 


Query: 


386 


AHSAEDFKQLLENGRI SNTEFYIHYLKPENTDNLQSE IAYVEKI AKLIAKS PS I TAQNSY 


445 






+S E F LL+ RIS+ EF+I YLKP N DNLQS+I YVEK+A LIA+SPSITAQ+SY 




Sb j ct : 


361 


RHSPEAFADLLKQSRISSVEFFIDYLKPTNVDNLQSQIVYVEKMAPLIAQSPSITAQHSY 


420 


Query: 


446 


ITKVAELLPDFDYFQVEQSVNNERLHHRSQQQASSSVQTSATVQLPQTGKLSAITKTEMQ 


505 






I K+A+LLP+FDYFQVEQSVN R+ R + Q + S V LP L+AI KTE 




Sb j ct : 


421 


INKIADLLPNFDYFQVEQSVNALRIQDRQKHQGQIAQAVSNLVTLPMPKSLTAIAKTESH 


480 


Query: 


506 


LFHRLLNHPYLLNEFRNRDNFYFDTTEIQVLYELLKESGEITSYDLSQESDKVNRTYYII 


565 






L HRLL+H YLLNEFR+RD+FYFDT+ +++LY+ LK+ G ITSYDLS+ S++VNR YY + 




Sb j ct : 


481 


LMHRLLHHDYLLNEFRHRDDFYFDTSTLELLYQRLKQQGHITSYDLSEMSEEVNRAYYNV 


540 


Query: 


566 


LEEQLPVEVSIGEIEAVEKARDRLLKERDLRKQSQLIRQSSNQGDEEGALAALENLIAQK 


625 






LEE LP EV++GEI+ + R +LL ERDL KQ + +R+SSN+GD + AL LE+ IAQK 




Sb j ct : 


541 


LEENLPKEVALGEIDDILSKRAKLLAERDLHKQGKKVRESSNKGDHQAALEVLEHFIAQK 


600 


Query: 


626 


RNME 629 








R ME 




Sb j ct : 


601 


RKME 604 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1226 

A DNA sequence (GBSxl302) was identified in S.agalactiae <SEQ ID 3805> which encodes the amino 
acid sequence <SEQ ID 3806>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -6.05 Transmembrane 41 - 57 ( 34 - 58) 
INTEGRAL Likelihood = -5.79 Transmembrane 93 - 109 ( 90 - 112) 

Final Results 

bacterial membrane Certainty=0. 3421 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9995> which encodes amino acid sequence <SEQ ID 9996> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC38560 GB:AF029731 large conductance mechanosensitive channel 
[Staphylococcus aureus] 
Identities = 64/126 (50%) , Positives = 83/126 (65%) , Gaps = 8/126 (6%) 
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Query: 23 MIKELKEFLFKGNVLDLAVAVILGAAENAIITSLVKDVITPLIIiNPVLKAAGVSNIA-QL 81 

M+KE KEF KGJWLDLA+AV++GAAFN II+SLV+++I PLI K G + A + 

Sbjct: 1 MLKEFKEFALKGNVLDLAIAVVMGAAFNKIISSLVENIIMPLI GKI FGSVDFAKEW 56 

Query: 82 SWNGVAYGNFLSAVINFLIVGTTLFFIVKAAfHCVMAKKPAEEEIIEVVEPTQEQLIiAEIR 141 

S+ G+ YG F+ +VI+F+I+ LF VK AN +M K+ AEE E V LL EIR 

Sbjct: 57 SFWGIKYGLFIQSVIDFIIIAFALFIFVKIANTLMKKEEAEE EAWEENWLLTEIR 113 

Query: 142 DLLANK 147 

DLL K 
Sbjct: 114 DLLREK 119 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3807> which encodes the amino 
sequence <SEQ ID 3808>. Analysis of this protein sequence reveals the following: 

Possible site: 28 
»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -5.95 Transmembrane 71 - 87 ( 67 - 90) 

Final Results 

bacterial membrane --- Certainty=0. 3378 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB15653 GB:Z99122 similar to large conductance mechanosensitive 
channel protein [Bacillus subtilis] 
Identities = 51/126 (48%), Positives = 77/126 (60%), Gaps = 7/126 (5%) 

Query: 1 MVKELKAFLFRGNIIEIiAVAVIIGGAFGAIVTSFVNDIITPLIIjNPALKAANVENITQLS 60 

M E KAF RGNI++LA+ V+IGGAFG IVTS VNDII PL+ L + ++ 
Sbjct: 1 MWNEFKAFAMRGNIVDIAIGVVIGGAFGKIVTSLViroilMPLV-GILLLGGIjDFSGLSFTF 59 

Query: 61 WNG- VKYGSFLGAVINFLI IGTSLFFWKAAEKAMPKKE KEAAAPTQEELLTEIR 114 

+ VKYGSF+ ++NFLII S+F V++ KKE E A QEELL EIR 

Sbjct: 60 GDAWKYGSFIQTIVNFLIISFSIFIVIRTLNGLRRKKEAEEEAAEEAVDAQEELLKEIR 119 

Query: 115 DLLAQK 120 

DLL Q+ 
Sbjct: 120 DLLKQQ 125 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 86/125 (68%), Positives = 99/125 (78%), Gaps = 5/125 (4%) 

Query: 23 MIKELKEFLFKGNVLDLAVAVILGAAFNAIITSLVKDVITPLILNPVLKAAGVSNIAQLS 82 

M+KELK FLF+GN+ + +LAVAVI +G AF AI+TS V D+ITPLILNP LKAA V NI QLS 
Sbjct: 1 ^WKELKAFLFRGNIIE]^VAVIIGGAFGAIOTSFvNDIITPLILNPALKAANVENITQLS 60 

Query: 83 WNGVAYGNFLSAVINFLIVGTTLFFIVKAA1SKVMAKKPAEEEIIEVVEPTQEQLIAEIRD 142 

WNGV YG+FL AVINFLI +GT+LFF+VKAA K M KK E PTQE+LL EIRD 

Sbjct: 61 WNGVKYGSFLGAVINFLI IGTSLFFWKAAEKAMPKKEK EAAAPTQEELLTEIRD 115 

Query: 143 LLANK 147 

LLA K 
Sbjct: 116 LLAQK 120 

A related GBS gene <SEQ ID 8753> and protein <SEQ ID 8754> were also identified. Analysis of 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
SRCFLG: 0 

McG: Length of OR: 4 

Peak Value of BR: 2.96 
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Net Charge of CR: 1 
McG: Discrim Score: 4.39 
GvH: Signal Score (-7.5): -1.79 

Possible site: 25 
>>> Seems to have a cleavable N-term signal seq. 
Amino Acid Composition: calculated from 26 
ALOM program count: 1 value: -5.79 threshold: 0.0 

INTEGRAL Likelihood = -5.79 Transmembrane 71 - 87 ( 68 - 90) 
PERIPHERAL Likelihood =1.06 28 
modified ALOM score: 1.66 
icml HYPID: 7 CFP: 0.331 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0. 3314 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Cer taint y=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF00541(367 - 741 of 1041) 

SP|068285|MSCL_STAAU(1 - 119 of 120) LARGE -CONDUCTANCE MECHANOSENSITIVE CHANNEL. 
GP|3135292|gb|AAC38560.l| |AF029731 large conductance mechanosensitive channel 
{Staphylococcus aureus} 
%Match =14.9 

%Identity =53.3 %Similarity =70.5 

Matches = 65 Mismatches = 31 Conservative Sub.s = 21 

177 207 237 267 297 327 357 387 

QVMTSTEITHYSFTFDYIIFSFLCKFFQKLFQGFLLH*FNIKIYR*FETYYLDFSKEICYNERE]^IKELvHMIKELKE 

MM 

MLKEFKE 

417 447 477 507 537 561 591 621 

FLFKGNVLDLAVAVIIiGAAFNAIITSLvTOVITPLILNFVLKftAGV^ 

I :||!!lllhl]: = lllll ll=ll|:==l III 11= 1= 1= 1= II 1= =11 = 1 = 1= II 

FALKGNVLDLAIAWMGAAFNKI ISSLVENIIMPLI GKIFGSVDFAK-EWSFWGIKYGLFIQSVIDFI I IAFALFI 

20 30 40 50 60 70 80 

651 681 711 741 771 801 831 861 

I VT<AANKvMAKKPXEEEIIEvVEPTQEQLl4XEIRDLLANK**KTRITEFFY*LIVIIYEKTAQF*TVFSYSI*LEF 

II II =1 1= III III II llllll I 

FVKIANTLMKKEEAEEE - - AWE - ENWLLTE IRDLLREKK 
100 110 120 

SEQ ID 8754 (GBS354) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 74 (lane 3; MW 17kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1227 

A DNA sequence (GBSxl303) was identified in S.agalactiae <SEQ ID 3809> which encodes the amino 
acid sequence <SEQ ID 3810>. This protein is predicted to be 30S ribosomal protein S21-related protein. 
Analysis of this protein sequence reveals the following: 

Possible site: 29 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 6479 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9391> which encodes amino acid sequence <SEQ ID 9392> 
was also identified. A related GBS nucleic acid sequence <SEQ ID 10799> which encodes amino acid 
5 sequence <SEQ ID 1080O was also identified. 

The protein is similar to the 30S ribosomal protein S21 from Listeria monocytogenes: 

>GP:BAA82793 GB:AB023064 30S ribosomal protein S21 [Listeria monocytogenes] 
Identities = 30/34 (88%) , Positives = 34/34 (99%) 

10 Query: 1 MTKAGTLQESRKREFYEKPSVKRKRKSEAARKRK 34 

++K+GTLQESRKREFYEKPSVKRK+KSEAARKRK 
Sbjct: 23 VSKSGTLQESRKREFYEKPSVKRKKKSEAARKRK 56 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3811> which encodes the amino acid 
1 5 sequence <SEQ ID 38 12>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

»> Seems to have no N- terminal signal sequence 

Final Results 

20 bacterial cytoplasm Certainty=0 .4815 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

25 Identities = 35/36 (97%) , Positives = 36/36 (99%) 

Query: 1 MTKAGTLQESRKREFYEKPSVKRKRRSEAARKRKKF 36 

+TKAGTLQESRKREFYEKPSVKRKRKSEAARKRKKF 
Sbjct: 35 TCKAGTLQESRKREFYEKPSVKRKRKSEAARKRKKF 70 

30 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1228 

A DNA sequence (GBSxl304) was identified in S.agalactiae <SEQ ID 3813> which encodes the amino 
35 acid sequence <SEQ ID 3814>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -7.06 Transmembrane 5 - 21 ( 3-23) 
INTEGRAL Likelihood = -2.28 Transmembrane 191 - 207 ( 189 - 207) 

40 



45 



Final Results 

bacterial membrane Certainty=0 .3824 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8755> and protein <SEQ ID 8756> were also identified. Analysis of this 
protein sequence reveals the following: 

50 Lipop Possible site: -1 Crend: 2 

McG: Discrim Score: 8.68 
GvH: Signal Score (-7.5): -5.71 
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Possible site: 18 
>>> Seems to have an uncleavable N-term signal seq 
ALOM program count: 2 value: -7.06 threshold: 0.0 

INTEGRAL Likelihood = -7.06 Transmembrane 5 - 21 ( 3-23) 
INTEGRAL Likelihood = -2.28 Transmembrane 191 - 207 ( 189 - 207) 
PERIPHERAL Likelihood = 4.35 142 
modified ALOM score: 1.91 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0. 3 824 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 8756 (GBS259) was expressed in E.coli as a His-fosion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 45 (lane 4; MW 54kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1229 

A DNA sequence (GBSxl305) was identified in S.agalactiae <SEQ ID 3815> which encodes the amino 
acid sequence <SEQ ID 3816>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.38 Transmembrane 136 - 152 ( 135 - 152) 



Final Results 

bacterial membrane Certainty=0. 1553 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD47593 GB:AF140784 Vexp2 [Streptococcus pneumoniae] 
Identities = 117/212 (55%) , Positives = 152/212 (71%) 

Query: 1 MLELKNIAYRYKGNDNKTLENINYSFQSGVFYTILGNSGSGKTTLLSLMAGLDSPTEGQV 60 

+L+L+++ YRYK L INY+F+ G FY+I+G SG+GK+TLLSL+AGLDSP EG + 

Sbjct: 3 LLQLQDVTYRYKNTAEAVLYQINYNFEPGKFYSIIGESGAGKSTLLSLLAGLDSPVEGSI 62 

Query: 61 LFNKKD I KEAGYAQHRKKNI ALVFQNYNLLDYLTPLENVQLVKPTADKQLLLDLGLKEDM 120 

LF +DI++ GY+ HR +I+LVFQNYNL+DYL+PLEN++LV A K LL+LGL E 
Sbjct: 63 LFQGEDIRKKGYSYHRMHHISLVFQNYNLIDYLSPLENIRLVNKKASKNTLLELGLDESQ 122 

Query: 121 LTRNILRLSGGQQQRVAIARALWGTPAILLDEPTGNLDFDISRDITMRLKDFAHKEKRC 180 

+ RN+L+LSGGQQQRVAIAR+LV P IL DEPTGNLD + DI LK A K +C 
Sbjct: 123 IKRNVLQLSGGQQQRVAIARSLVSEAPVILADEPTGNLDPKTAGDIVELLKSLAQKTGKC 182 

Query: 181 VIMVTHSREIAHMADTALQLIGDNLKELSKES 212 

VI+VTHS+E+A +D L+L L E S 
Sbjct: 183 VIWTHSKEVAQASDITLELKDKKLTETRNTS 214 

SEQ ID 3816 (GBS363) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 74 (lane 5; MW 28kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 81 (lane 10; MW 53kDa). 

GBS363-GST was purified as shown in Figure 216, lane 9. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1230 

A DNA sequence (GBSxl306) was identified in S.agalactiae <SEQ ID 3817> which encodes the amino 
5 acid sequence <SEQ ID 3818>. This protein is predicted to be Vexp3. Analysis of this protein sequence 
reveals the following: 

Possible site: 47 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-14.97 Transmembrane 71 - 87 ( 66 - 97) 
10 INTEGRAL Likelihood = -3.61 Transmembrane 2 - 18 ( 1-18) 

Final Results 

bacterial membrane Certainty=0 . 6986 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
20 vaccines or diagnostics. 

Example 1231 

A DNA sequence (GBSxl307) was identified in S.agalactiae <SEQ ID 3819> which encodes the amino 
acid sequence <SEQ ID 3820>. This protein is predicted to be Vexp3. Analysis of this protein sequence 
reveals the following: 

25 Possible site: 45 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1986 (Affirmative) < suco 

30 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

35 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1232 

A DNA sequence (GBSxl308) was identified in S.agalactiae <SEQ ID 3821> which encodes the amino 
acid sequence <SEQ ID 3822>. This protein is predicted to be Vexp3. Analysis of this protein sequence 
40 reveals the following: 

Possible site: 34 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -6.05 Transmembrane 22 - 38 ( 17 - 39) 

45 Final Results 

bacterial membrane Certainty=0 . 3421 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD47594 GB:AF140784 Vexp3 [Streptococcus pneumoniae] 
Identities = 39/153 (25%) , Positives = 67/153 (43%) , Gaps = 9/153 (5%) 

Query: 3 LFKRSFLYVSRKKRKSITLFVCLWLVASTLISGIAVKNAGLTA-KKTFSRQTGSILHISS 61 

+ +F YV+RK KSI +F+ + L+AS + G+++K A A ++TF T S + 
Sbjct: 1 MLHNAFAYVTRKFFKSI VIFLI ILLMASLSLVGLSIKGATAKASQETFKNITNS - FSMQI 59 

Query: 62 DSTDLVGDGYGSGEIPEKAIVNIASNPNVKRvNNNLMAYAGLTSEKMVTRPNDKEQYKE- 120 

+ G G+G I + I I N ++ + A LT ++ P K+ 

Sbjct: 60 NRRVNQGTPRGAGNIKGEDIKKITENKAIESYVKRINAIGDLTGYDLIETPETKKNLTAD 119 

15 Query: 121 QVLQVHGNSYSDTDPKYTAGMISLKGG 147 

L+G+S +K++G L G 
Sbjct: 120 RAKRFGSSLMITGVNDSSKEDKFVSGSYKLVEG 152 

No corresponding DNA sequence was identified in S.pyogenes. 

20 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1233 

A DNA sequence (GBSxl309) was identified in S.agalactiae <SEQ ID 3823> which encodes the amino 
acid sequence <SEQ ID 3824>. Analysis of this protein sequence reveals the following: 

25 Possible site: 39 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-15.76 Transmembrane 295 - 311 ( 287 - 317) 

INTEGRAL Likelihood = -7.59 Transmembrane 49 - 65 ( 46 - 69) 

INTEGRAL Likelihood = -6.90 Transmembrane 340 - 356 ( 339 - 362) 

30 INTEGRAL Likelihood = -5.57 Transmembrane 411 - 427 ( 404 - 430) 

Final Results 

bacterial membrane Certainty=0 . 7305 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9695> which encodes amino acid sequence <SEQ ID 9696> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

40 >GP:CAB12182 GB:Z99106 similar to transporter [Bacillus subtilis] 

Identities = 95/370 (25%) , Positives = 167/370 (44%) , Gaps = 41/370 (11%) 

Query: 109 ESVEASLSIDVGSRLKSVSPYNSS KEENQVTLAGYQSTEDLRAFQTKALVLK 160 

+++E+S S D S S + NS + +++ G ST + F + 

45 Sbjct: 115 DAIESSSSSDSSSSSSSSNAKNSQGGGQGGPQMVQADLSIEGVISTALVDDFSDGDSKIT 174 

Query: 161 KGSHLAADNT- -KQVLVPLKLAQKNHLSVGNKLRLGK ENVT IAGIYDANSA- - 209 

G + + K ++ LA++N LSVG+ + + E+ T I GIY S+ 

Sbjct: 175 DGRAITKSDVGKKVTVINETLAEENDLSVGDSITIESATDEDTTVKLKIVGIYKTTSSGD 234 

50 

Query: 210 -KSKNTFNPNIDNTLIAQATLvRKISKQKGYQTV AVRLSDKRLVDTVIQNIKQWPLD 265 

+++N NNL T + T+ +D + +DT ++ K+ +D 

Sbjct: 235 DQAQNFSFIjNPYNKLYTPYTATAALKGDDYKNTIDSAVYYMDDAKNMDTFVKAAKKTSID 294 

55 Query: 266 FGKLDVQTAKEFYGDSYRNIETLHRLVGRIILIVSLvAMAILWMLTFWINNRIKETGIL 325 

F + T + Y IE + ++ +VS+ IL +++ IRE G+L 

Sbjct: 295 FDTYTimOTJQLYQQMVGPIENVASFSKNVVYLVSVAGAVILGLIvmSIRERKYEMGVL 354 
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Query: 326 LAIGKTKFE I IGHYLI EVLLVAGAAFTLS I IGGVFLGKTFAAGLLSQV 373 

+AIG+ ++++IG +L E+L+VA A L+ + G + LLSQ 
Sbjct: 355 MAIGEKRWKLIGQFLTEILIVAVIAIGLASVTGNLVANQLGNQLLSQQISSSTDSTQTAS 414 

Query: 374 NGGVSSQIVQNSSLI IDRIDNLAVSVGVMDVFRLYAQGALI CLFAWLSSYS IL 427 

GG+ ++ +SS +D ID+L V+V + D+ h G LI + A +L S S+L 
Sbjct: 415 GQMPGGGGGMGGKMFGHSSSNVDVIDSLNVAVSMNDMLILGGIGILIAIIATLLPSISVL 474 

Query: 428 KLQPKQILSR 437 

+L PK IL++ 
Sbjct: 475 RLHPKTILTK 484 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8757> and protein <SEQ ID 8758> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 

McG: Discrim Score: 1.50 

GvH: Signal Score (-7.5): -8.43 
Possible site: 39 

>>> Seems to have an uncleavable N-term signal seq 

ALOM program count: 4 value: -15.76 threshold: 0.0 

INTEGRAL Likelihood =-15.76 Transmembrane 295 - 311 ( 287 - 317) 
INTEGRAL Likelihood = -7.59 Transmembrane 49 - 65 ( 46 - 69) 
INTEGRAL Likelihood = -6.90 Transmembrane 340 - 355 ( 339 - 362) 
INTEGRAL Likelihood = -5.57 Transmembrane 411 - 427 ( 404 - 430) 
PERIPHERAL Likelihood = 3.45 386 
modified ALOM score: 3.65 

*** Reasoning Step: 3 

Final Results 

bacterial membrane — Certainty=0. 7305 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF00687(421 - 1611 of 1917) 

EGAD | 108957 |BS0375 (11 - 484 of 486) hypothetical protein {Bacillus subtilis} 
OMNI |NT01BS0429 membrane transport protein GP| 1805444 | dbj |BAA09006 . 1 | |D50453 homologue of 
hypothetical protein in a rapamycin synthesis gene cluster of Streptomyces hygroscopicus 
{Bacillus subtilis} GP| 2632675 | emb | CAB12182 . 1 | | Z99106 similar to transporter {Bacillus 
subtilis} PIR|F69762|F69762 transporter homolog yell - Bacillus subtilis 
%Match =8.6 

%Identity =28.7 %Similarity =52.2 

Matches = 117 Mismatches = 184 Conservative Sub.s = 96 

312 342 372 402 432 462 492 522 

VL*NH*LIDNVEVDREYLTTSIVILEIIKIEKGGKIWL^ 

:| :| ||, ::|| | ::| ::|| : :| 
MNFIKRAFWNMKAKKGKTLLQLFVFTVICVFVLSGLAIQSARQK 
10 20 30 40 

543 573 603 624 654 

N ILTKQGKSIYLTSKEKAYWPEQAYEALKK AKMVESVEASLSID 

: I |: I : =1 |: I =:=l=l I I 

SSEIARQELGGSVTLQVDRQKQMEKQQDSGEKRTFESTPIKVSDANKLAALDHVKSYNYTTSASANAGNFDAIESSSSSD 

60 70 80 90 100 110 120 

684 720 750 780 807 834 864 

VGSRLKSVSPYNSS KEENQWLAGYQSTEDLRAFQTKALvLKKGSHLA-ADNTKQV-LVPLKLAQKNHLSVG 

I I : II : ::: 1 || : | : | : :| |:| :: ||::| ||]| 

SSSSSSSSNAKNSQGGGQGGPQMVQADLSIEGVISTALVDDFSIXSDSKITDGRAITKSDVGKKvTVINETLAEENDLSVG 
140 150 160 170 180 190 200 
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885 903 954 978 1008 1065 

NKLRL GKENVTI AGIYDANSA- - -KSKNTFNPNIDNTLIA- -QATLVRKISKQKGYQTVAVR-LSDKRLVDTV 

1=1 = III |: :::| III III I II = I = =11 

DSITIESATDEDTTVKLKIVGIYKTTSSGDDQAQNFSFI^PYNKLYTPYTATAALKGDDYKISrriDSAVYYMDDAKNMDTF 
220 230 240 250 260 270 280 

1095 1125 1155 1185 1215 1245 1275 1305 

IQNIKQWPLDFGKLDVQTAKEFYGDSYRNIETLHRLVGRIILIVSLVAMAILVVMLTFWINNRIKETGILLAIGKTKFEI 

=: |: =11 = I ::| II = = = = =lh II = = = III 1=1=111= = = = = 

VKAAKKTSIDFDTYTLNITOQLYQQMVGPIENVASFSKN^ 

300 310 320 330 340 350 360 

1335 1365 1395 1431 1461 1491 

IGHYLIEVLLVAGAAFTLS I IGGVFLGKTFAAGLLSQV NGGVSSQIVQNSSLI IDRIDNLAV 

l|:=l 1=1=11 I 1= = I == = llll 11= == =11 =1 11=1 I 

IGQFLTEILIVAVIAIGLASVTGNLVANQLGNQLLSQQISSSTDSTQTASGQMPGGGGGMGGKMFGHSSSNVDVIDSLNV 
380 390 400 410 420 430 440 

1521 1551 1581 1611 1641 1671 1701 1731 

SVGVMDVFRLYAQGALICLFAVVLSSYSILKLQPKQILSRMS*EVNMNLFKRSFLWSRKKRKSITLFVCLWLVASTLIS 

= 1 = l = = I I II = I =1 I 1 = 1 = 1 = 11 lh: 
AVSMNDMLILGGIGILIAIIATLLPSISVLRLHPKTILTKQE 

460 470 480 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1234 

A DNA sequence (GBSxl310) was identified in S.agalactiae <SEQ ID 3825> which encodes the amino 
acid sequence <SEQ ID 3826>. Analysis of this protein sequence reveals the following: 
Possible site: 24 

>» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11993 GB:Z99105 ybdG [Bacillus subtilis] 
Identities = 66/224 (29%) , Positives = 102/224 (45%) , Gaps = 22/224 (9%) 

Query: 84 IKEYGQKVEVKGKKMNWTVGEGKVPIVFIPGQGTVTAKHQYHNLISNLSKTHKVWVEP 143 

+K G V+V GKKMNVY G GK VF+ G G ++ h S SK +K+ W+ 

Sbjct: 41 LKGKGTVVDVDGKKMNVYQEGSGKDTFVFMSGSGIAAPAYEMKGLYSKFSKENKIAVVDR 100 

Query: 144 FGSGLSDVIDQPRNLANITSDIHEALQKVGITGKYVIASHSIGGVYALKYISTYPKEVLG 203 

G G S+V R++ + +AL K G Y++ HSI G+ A+ + YPKE+ 

Sbjct: 101 AGYGYSEVSHDDRDIDTVLEQTRKALMKSGNKPPYILMPHSISGIEAMYWAQKYPKEIKA 160 

Query: 204 LIGLDTSTP GMEGGKQVDF AAPVLKELPKTPKVSDDIN 241 

+1 +D P G++ K F +A E+ + ++D+ 

Sbjct: 161 I IAMDIGLPQQYVTYKLSGVDRLKVRGFHLIiTSIGFHRFI PSAVYNPEVIRQSFLTDEEK 220 

Query: 242 AQFFAIGHKILNNSNMICEEAKNSSNMINESANYKIPKGIPAMYL 285 

+ AI K N++M+ E S ++S N PK P + L 
Sbjct: 221 EIYKAINFKQFFNADMEHELLQSYQNGSKSVNLPAPKETPVLIL 264 



No corresponding DNA sequence was identified in S.pyogenes. 
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SEQ ID 3826 (GBS121) was expressed in E.coli as a His-fbsion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 24 (lane 9; MW 40kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 31 (lane 6; MW 65kDa). 

GBS121-GST was purified as shown in Figure 198, lane 6. 

5 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1235 

A DNA sequence (GBSxl311) was identified in S.agalactiae <SEQ ID 3827> which encodes the amino 
acid sequence <SEQ ID 3828>. Analysis of this protein sequence reveals the following: 

10 Possible site: 33 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) <: suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8759> which encodes amino acid sequence <SEQ ID 8760> 
was also identified. Analysis of this protein sequence reveals the following: 

20 Lipop: Possible site: -1 Crenel: 8 

McG: Discrim Score: 3.70 
GvH: Signal Score (-7.5): -0.0600004 

Possible site: 22 
>>> Seems to have a cleavable N-term signal seg. 
25 ALOM program count: 0 value: 8.01 threshold: 0.0 

PERIPHERAL Likelihood = 8.01 167 
modified ALOM score: -2.10 

*** Reasoning Step: 3 

30 

Final Results 

bacterial outside Certainty=0. 3 000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) 

35 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 8760 (GBS60) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 16 (lane 7; MW 38.6kDa). 

GBS60-His was purified as shown in Figure 193, lane 3. 

40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1236 

A DNA sequence (GBSxl312) was identified in S.agalactiae <SEQ ID 3829> which encodes the amino 
acid sequence <SEQ ID 3830>. This protein is predicted to be unnamed protein product. Analysis of this 
45 protein sequence reveals the following: 

Possible site: 21 

>» May be a lipoprotein 
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Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm -— Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9693> which encodes amino acid sequence <SEQ ID 9694> 
was also identified. 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8761> and protein <SEQ ID 8762> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: 19 Crend: 5 
McG: Discrim Score: 9.85 
GvH: Signal Score (-7.5): -0.28 

Possible site: 21 
»> May be a lipoprotein 

ALOM program count: 0 value: 9.07 threshold: 0.0 
PERIPHERAL Likelihood =9.07 99 
modified ALOM score: -2.31 



Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

37.0/57.2% over 118aa 

Bacillus subtilis 

EGAD | 108627 | hypothetical protein Insert characterized 

GP| 2632485 | emb | CAB11993 . 1 1 | Z99105 ybdG Insert characterized 

PIR|D69747 |D69747 hypothetical protein ybdG - Insert characterized 

ORF00608(553 - 906 of 1416) 

EGAD| 108627 |BS0200 (51 - 169 of 296) hypothetical protein {Bacillus subtilis} 
GP|2632485|emb|CAB11993.l| |Z99105 ybdG {Bacillus subtilis} PIR | D69747 | D69747 hypothetical 
protein ybdG - Bacillus subtilis 
%Match =8.7 

%Identity =37.0 %Similarity =57.1 

Matches = 44 Mismatches = 50 Conservative Sub.s = 24 

339 369 399 429 459 489 519 549 

ITKLSTVALSLLLCTACAASNTSTSKTQSHHPKQTKLTDKQKEEPKNKEAADQEMHPQGAvDLTKYKAKPVKDYGKKIDV 



MKTLWKVLKIVFVSLAALVLLVSVSVFIYHHFQLNKEAALLKGKGTVVD 
10 20 30 40 

579 609 639 669 699 729' 759 789 

GDGKKMNIYETGQGKIPIVFIPGQAEISPRYAYKNLIERLSKKyKIYTVEPLGYGLSDIPTKPRTLENITKEIHTGLNKI 

111111=1: I II 11= I =1111 :=lh II 1= III l== I == = == II 

TOGKKMNWQEGSGKDTFVFMSGSGIAAPAYEMKGLYSKFSKENKIAVvDRRGYGYSEVSHDDRDIDTVLEQTRKALMKS 

60 70 80 90 100 110 120 



816 846 876 906 936 966 996 1026 

GVTOtfFY-IAAHSLGGMYSLNYAKNYPEEWGFIGMDTSTPWMEGEQKTKYDPE 

II II 11= 1= == =1= lhl = : I II I I 

GNKPPYILMPHSISGIEAMYWAQKYPKEIKAI IAMDIGLPQQYVTYKLSGVDRLKVRGFHLLTSIGFHRFIPSAVYNPEV 
140 150 160 170 180 190 200 

SEQ ID 8762 (GBS21) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 11 (lane 3; MW 31.6kDa). 



GBS21-His was purified as shown in Figure 192, lane 11. 
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GBS21L was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 124 (lane 8-10; MW 66.5kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 124 (lane 1 1; MW 41.5kDa) and in Figure 180 
(lane 6; MW 41kDa). GBS21L-His was purified as shown in Figure 232 (lanes 3 & 4) 

5 Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1237 

A DNA sequence (GBSxl313) was identified in S.agalactiae <SEQ ID 3831> which encodes the amino 
acid sequence <SEQ ID 3832>. This protein is predicted to be endopeptidase O. Analysis of this protein 
10 sequence reveals the following: 

Possible site: 18 

>>> Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0 . 3854 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

20 >GP:AAF67832 GB:AF179267 endopeptidase Pep02 [Lactococcus lactis] 

Identities = 21/36 (58%), Positives = 26/35 (71%) 

Query: 1 MRANI PVRNFQEFYDAFGVKKGDSMYLKPEKRLTLW 36 
+RANIP N +EFY+ F VK+ D MY PEKRL +W 
25 Sbjct: 592 LRANI PPTNLEEFYETFD vKETDQMYRAPEKRLKIW 627 

There is also some homology to SEQ ID 2384: 

Identities = 13/36 (36%) , Positives = 25/36 (69%) 

30 Query: 1 MRANI PVRNFQEFYDAFGVKKGDSMYLKPEKRLTLW 36 

+R N+ + NF F++ F +K+GD+M+ P+ R+ +W 
Sbjct: 596 LRTNVTLTNFDAFHETFDIKEGDAMWRAPKDRVIIW 631 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

Example 1238 

A DNA sequence (GBSxl314) was identified in S.agalactiae <SEQ ID 3833> which encodes the amino 
acid sequence <SEQ ID 3834>. This protein is predicted to be endopeptidase O. Analysis of this protein 
sequence reveals the following: 

40 Possible site: 47 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3801 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA16168 GB:L18760 endopeptidase [Lactococcus lactis] 
50 Identities = 118/268 (44%) , Positives = 174/268 (64%) , Gaps = 6/268 (2%) 
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Query: 


± 




oU 






+G +YGKKYFGEAAK DV+ M +1 VY+ RL N WLS+ T AI+KLD + 1G+ 




Sb j ct : 


321 


IGLFYGKKYFGEAAKADVKRMVTAMIKVYQVRLSKNEWLSQETAEKAIEKLDAITPFIGF 


380 


yUci y . 


61 


PFnVPnT^PnvnT+TlQTCA^PTHVRT^nv^ 

trCiLJ 1 xrUXJ X uyiyr JJDINriOr C CiViiMLJVl X I\.r\_Ll OIn 1 J? Eilir !N^OlN^JXiinWyi v lO/-llW-i Vlvi-iXlNlJir 


i on 






P+ P++Y + + S S +E+ + K+ +TFE+F++ + W M A+ WAY P 




Sb j ct : 


381 


PDKLPEIYSRLKTTS-GSLYEDALKFDKILTARTFEKFSEDVDKTSWHMPAHMVNAYYSP 


439 


Query : 


JL 


"NFFNTQ TVTTPZX & T TJTlQPT.VnTf TT^TUQANTVY^ T T^PTTT QPCl?nTTiTnMTrvnT?Tri^MT.MnWTATT 
i\xi.\o±. vr xr/^H.xryor'j_ixJjj\.ii\.i v Dyw luftioHi lbnciiDno r jj iWbnJvi JJaixui'iiJnJJVv 1AI J. 


i on 






++N+ IVFPAAI Q+P Y ++ SQNYG IGA+I HEISH+FD NG ++D++GNL+ WW 




Sb j ct : 


440 


DSNTIVFPAAILQAPFYSLEQSSSQNYGGIGAVIAHEISHAFDNNGAQFDKEGHUSTKWWL 


499 


Query: 


181 


kedlkhykkktqamidqwdglkadggk™gkltlaeniadmggvmaslealktekiqtik 


240 






ED + +++K + MI +DG++ + G +GKL ++ENIAD GG+ A+L A K EK +K 




Sb j ct : 


500 


dedyeafeekqkemialfdgveteagpangklivseniadqggitaaltaakdekdvdlk 


559 


Query: 


241 


NFLNHGQVFGVKKQPKNKVSPQFSQMFM 268 








F+ K+KS+FQM + 




Sbjct: 


560 


AFFSQW AKIWRMKASKEFQQMLL 582 





There is also homology to SEQ ID 2384: 

Identities = 110/253 (43%), Positives = 161/253 (63%), Gaps = 1/253 (0%) 

1 MGDYYGKKYFGFAAKKDVEHMAKKIINVYKTRLKmrV^SENTKAmiKKLDNMRLMIGY 60 

+G +Y + F AK DVE ++I VYK+RL+ WL+ T+ AI KL+ + IGY 
324 LGLWYAGQKFSPFAKADvESKVARMIEVYKSRliETADWLAPATREKAITKLNVITPHIGY 383 

61 PEDYPDLYRQYQFDSKASFFENNDNYRKLSNKKTFEEFNQSNQREHWQMSANAWAYNDP 120 

PE P+ Y + D S EN N K++ T+ ++N+ R W M A+ VNAY D 
384 PEKLPETYAKKVIDESLSLVENAQNLAKITIAHTWSKW1WPVDRSEWHMPAHLVNAYYDL 443 

121 NTNS IVFPAAI FQSPLYDKTKTVSQNYGAIGAI IGHEISHSFDINGMKYDEKGNLHDWWT 180 

N IVFPAAI Q P Y ++ S NYG IGA+I HEISH+FD NG +DE G+L+DWWT 
444 QQNQIWPAAILQEPFYSLDQSSSANYGGIGAVIAHEISHAFDTNGASFDEHGSIiNDWWT 503 

181 KEDLKHYKKKTQAMIDQWDGLKADGGKVDGKLTLAENIADNGGVMASLEALKTEKIQTIK 240 

+ED +K++T ++ Q+DGL++ G KV+GKLT+ +EN+AD GGV +LEA ++E+ + + 
504 QEDYAAFKERTDKIVAQFDGLESHGAKVNGKLTVSENVADLGGVACALEAAQSEEDFSAR 563 

241 N-FLNHGQVFGVK 252 

+ F+N ++ +K 
564 DFFINFATIWRMK 576 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1239 

A DNA sequence (GBSxl315) was identified in S.agalactiae <SEQ ID 3835> which encodes the amino 
acid sequence <SEQ ID 3836>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9691> which encodes amino acid sequence <SEQ ID 9692> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 
Sb j ct : 
Query: 
Sb j ct : 
Query: 
Sb j ct : 
Query: 
Sb j ct : 
Query: 
Sb j Ct : 
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>GP:AAC35997 GB :AF019410 endopeptidase O [Lactobacillus helveticus] 
Identities = 85/315 (26%) , Positives = 146/315 (45%) , Gaps = 8/315 (2%) 

Query: 46 NVSPRENLYRAVlTONWIJOTKLKQGQTSWSFSEIEDKLKQLLVSDMAKmSGKIETTN- 104 
5 N P++NLY AVN WL+ ++ QTS +E++ K+++ ++ D A +ASGK + + 

Sbjct: 20 NAKPQDNLYIAWSEWLSKAEIPADQTSAGVNTELDIKIEKRMMKDFADIASGKEKMPDI 79 

Query: 105 DEQKKMVAYYKQGMDFKTRDKNGLKPLKPVXiQKLFJW 164 
+ K +A YK +F RD P++ LQK+ + + F+ A + M + LPF 

10 Sbjct: 80 RDFDKAIALYKIAKNFDKRDAEKANPIQNDLQKILDLINFDKFKDNATELFMGPYALPFV 139 

Query: 165 LTWTNARDNSQKQLVLRQAPALLESPDQYKKGNKEGFAKLSAYRTSAMALLKQAGKSNI 224 

V+ + ++ L L YK E + L ++ LL+ AG 

Sbjct: 140 FD VDADMKNTDFNVLHFGGPSTFLPDTTTYK- - TPEAKKLLD I LEKQS INLLEMAGIGKE 197 

15 

Query: 225 EDRKLVKQAIAFDRLLSEKTQVTIQSKITAESETAAGRYNPESMETVHNYAKEFDFKELIE 284 

E R V+ A+AFD+ LS+ K T E A YNP S+ K FD + ++ 

Sbjct: 198 EARVYVQNALAFDQKLSKW KSTEEWSDYAAIYNPVSLTEFLAKFKSFDMADFLK 252 

20 Query: 285 KLVGPTNKAVJ^DKTYFKQWTOVINSKQL^ 344 

++ + V V + + +++IN +K WM++ + + +L + R AA F 

Sbjct: 253 TILPEKVERVIVMEPRFLDHADEI.INPANFDEIKGWMLVKYINSVAKYLSQDFRAAAFPF 312 

Query: 345 KNVASGLTQIESKEK 359 
25 SG ++ S+ K 

Sbjct: 313 NQAISGTPELPSQIK 327 

A related GBS gene <SEQ ID 8763> and protein <SEQ ID 8764> were also identified. Analysis of this 

protein sequence reveals the following: 

30 Lipop: Possible site: -1 Crend: 10 

McG: Discritn Score: 5.41 
GvH: Signal Score (-7.5): -1.39 

Possible site: 36 
»> Seems to have a cleavable N-term signal seq. 
35 ALOM program count: 0 value: 2.76 threshold: 0.0 

PERIPHERAL Likelihood = 2.76 151 
modified ALOM score: -1.05 



40 



45 



*** Reasoning Step: 3 



Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 8764 (GBS 12) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 1 (lane 7; MW 65kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 3 (lane 3; MW 39kDa). 

The GST-fusion protein was purified as shown in Figure 189, lane 4. 

50 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1240 

A DNA sequence (GBSxl317) was identified in S.agalactiae <SEQ ID 3839> which encodes the amino 
acid sequence <SEQ ID 3840>. Analysis of this protein sequence reveals the following: 

55 Possible site: 15 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.75 Transmembrane 301 - 317 ( 299 - 317) 
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Final Results 

bacterial membrane Certainty=0. 1702 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

5 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB42180 GB:A67181 unnamed protein product [unidentified] 
Identities = 245/771 (31%) , Positives = 410/771 (52%) , Gaps = 80/771 (10%) 

10 Query: 22 WVIvEFNKESILDYATEQKKTVAQmQADVEKKLQSIKQEQDKVLKNIEKSVHFDSSKV 81 
VRVIV NK + D+ ++ + A + + +E+ +K Q+KV+K +E+ +KV 
Sbjct: 97 VRVIVSLNKSAAFDHTSKPTGSAASVKK- - IEQASDQVKDGQEKVI KQVEE ITGNKV 151 

Query: 82 KR-YDAIINGVALDIQAQEIEKLKTIADVRRvYVSQEYVQTKPLLSSSGQLIGLPEVWNN 140 
15 +R + ++N ++D+ +I+K+K + V+ V + Y P S+ Q+ + +VW 

Sbjct: 152 RRQFGYLVNAFSIDMDLDDIDKVKDLPQVKNVTPVKVY HPTDESADQMAQVQDVWQE 208 

Query: 141 SQYKGEGTWAVIDSGVDFKHQALKIKEPNRAKYNKTSIE KLIHEKNLKGKFYSEK 196 

+ KGEG V+++ID+G+D HQ LK+ +K+ +E KL H GK+Y+EK 
20 Sbjct: 209 QKLKGEGMVISIIDTGIDSSHQDLKLDSGVSTALSKSEVESDKSKLGH GKYYTEK 263 

Query: 197 VPYGYNYYDYNDNLKDS - YGVMHGMHVTGIVGANDDNQKLYGVAPNAQI LAMKVFSDDQQ 255 

VPYGYNY D ND + D+ G MHG HV GI GAN ++ GVAP+AQ+LAMKVFS++ + 
Sbjct: 264 VPYGYNYADKNDQIVDNGCGEMHGQHVAGIAGANG QVKGVAPDAQLLAMKVFSNNAK 320 

25 

Query: 256 NPTTFTDVWLKALDDAILLKADWNMSLGTPAGFVHEGKDYPELEVIARACKAGIVIAVA 315 

N + D + A++D++ L ADV+NMSLG+ + V G P+ + +A+A +AG++ ++ 
Sbjct: 321 NSGAYDDDIISAIEDSVKLGADVINMSLGSVSSDV--GPSDPQQQAVAKASEAGVINVIS 378 

30 Query: 316 AGNE GNITDGNTYGWPLAENYDTALIANPALDDNTIAVASMENLKKHAHVLKFK-- 370 

AGN G+ DGN +E + + P + + L VAS EN K +K + 

Sbjct: 379 AGNSGVAGSTADGNPVNNTGTSE LSOTGTPGVTPDftLTVASAENSKVTTDWKDELG 435 

Query: 371 - DKKSGTEVTEVINIiHVAPNASKTIIGLAVDLGAGAPSELS - - KHFDLSGKIA 420 

35 + K +VT + + + K + VD+G G + + K ++ G++A 

Sbjct: 436 GVTFSSNSELKGAAQVTTQLESNYSVIjTKKLKIi VDMGLGGADDYTAEKKAEVKGQLA 492 

Query: 421 MLEIPEDNKSNGFLEK^QAITKLNPAAILLYNNAKVKDDLGSQLLVESEAAKFNIARITR 480 
+++ + F KV A I++YN+ D L S L + +++ 

40 Sbjct: 493 WK RGAYTFSAKVANAKAAGAAGIVIYNSE--DDGLLSMSLDDKTFPTLGMSKADG 546 

Query: 481 STY NNIKNNSNKIITILTERQAIDNSIAGQLSSYSSWGPTPDLRLKPEITAPGGHI 536 

+ ++ + K T L IDNS AG++S ++SWGPTP+L KPEITAPGG I 

Sbjct: 547 KFWLKQQKKVRASRLKFGTAL IDNSRAGKMSDFTSWGPTPELDFKPEITAPGGKI 601 

45 

Query: 537 FSTVEDNQYADKSGTSMAAPQVAGAAAVLKQYITDKKIPV--DNAADFIKLLLMNTAQPI 594 

+S DN+Y SGTSMA+P VAG+ A++ QI+ + ++ FK MNT+ P+ 
Sbjct: 602 YSLANDNKYQQMSGTSMAS PFVAGSEAL I LQG I KKQGLNLSGEELVQFAKNSAMNTSHPV 661 

50 Query: 595 IN-KQSKDGKTPYFVRQQGSGAMNIAKALTOTWATWGTNDNNADGKLELREL-KEKKF 652 
+ + +K+ +P R+QGSG +N+ A+ TV N +G L+E+ ++ F 

Sbjct: 662 YDTEHTKEIISP RRQGSGEINVKDAINNTVEVKAA NGNGAAALKEIGRQTTF 713 

Query: 653 KARILLRNFGKTNKTYIISSFA--IADPVDEKGFRTQNSEHLVSKKADAVTRKVTVEAGK 710 
55 K + L N GK +TY + + + K +++ +V + T KVTV+ G+ 

Sbjct: 714 K- - VTLTNHGKKAQTYAVDNYGGPYTQATEAKSGEIYDTK- IVKGQLTTETPKVTVQPGE 770 

Query: 711 TLAVDLDVDYSDAEALTRNNFLEGYUJLK-DTEGVADLHLPFLGFYGSWTE 760 
+VD+ + + R NF+EGY+ + + +L LP++GF+GS+++ 
60 Sbjct: 771 --SVDVSFTLTLPYSFQRQNFVEGYVGFEAKDQATPNLVLPYMGFFGSYSQ 819 

A related GBS gene <SEQ ID 8767> and protein <SEQ ID 8768> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
65 McG: Discrim Score: -8.37 
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GvH: Signal Score (-7.5): -6.06 

Possible site: 15 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -1.75 threshold: 0.0 

INTEGRAL Likelihood = -1.75 Transmembrane 301 - 317 ( 299 - 317) 
PERIPHERAL Likelihood = 1.75 614 
modified ALOM score: 0.85 



*** Reasoning Step: 3 



Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



Certainty=0. 1702 (Affirmative) < suco 
Certainty=0.0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

ORF00677(358 - 3159 of 3255) 

EGAD | 139899 | 149200 (95 - 1541 of 1946) prtB protein {Lactobacillus delbrueckii} 
GP|l381114|gb|AAC41529.l| |L48487 proteinase precursor {Lactobacillus delbrueckii} 
PIR| JC6032| JC6032 lactocepin (EC 3.4.21.96) precursor [similarity] - Lactobacillus 
delbrueckii subsp. bulgaricus 
%Match =15.5 

%Identity =33.3 %Similarity =54.6 

Matches = 275 Mismatches = 343 Conservative Sub.s = 176 



318 348 378 408 438 468 498 528 

KAVTVTKPQGAVAEKATPAVPKPQKTOVIWFNKESILDYATEQK^^ 

lllll :|| : :|: :: = I = :|: =1 Mhl =1 = 

SKFQEAAKEQRQASGQAVSKKNESS VRVIVSLNKSAAFDHTSKPTGSAASV- - KKIEQASDQVKDGQEKVI KQVEE - - - 1 
90 100 110 120 130 140 



555 585 615 645 675 705 735 765 

DSSKVKR-YDAIINGVALDIQAQEIEKLKTIADVRRVYVSQEYVQTKPLLSSSGQLIGLPEvWNNSQyKGEGTVVAVIDS 

:||:| = ::| = = 1= = h I = I = 1= I = I I |: |: : :|| : |||| |:::||: 

TGNKVRRQFGYLWAFSIDMDLDDIDKVKDLPQVKNVTPVKVYHPT DESADQMAQVQDVWQEQKLKGEGMVISIIDT 

160 170 180 190 200 210 220 

795 825 855 885 915 942 972 1002 

GVDFKHQALKIKEPNRAKYNKTSIEKLIHEKNLKGKFYSEW 

hi || Ih :|: =1 I Ihhlllllllll I II = h I Ml II II III = 

GIDSSHQDLKLDSGVSTALSKSEVESDKS - KLGHGKYYTEKVPYGYNYADKNDQIVDNGCGEMHGQHVAGIAGAN - - -GQ 
240 250 260 270 280 290 



1032 1062 1092 1122 1152 1182 1212 1242 

LYGVAPNAQILAMKVFSDDQQNPTTFTDVWLKALDDAILLKADVVNMSLGTPAGFVHEGKDYPELEVIARACKAGIVIAV 

: lllhlhllllllh: =1 : I : l = = h = I 111 = 11111= =11 1= = =hl = lh = = 
VKG VAPDAQLLAMKVFSNNAKNSGAYDDD 1 1 SAI EDSVKLGAD VINMSLGSVS SD V- - GPSDPQQQAVAKASEAGVINVI 

310 320 330 340 350 360 370 



1272 1302 1326 1356 1386 1416 1656 
AAGNEGNITDGNTYGVKPLAENYDTAL- - IANPALDDNTLAVASMENLKKHAHVLKFKDJOCSGTEVTEV AAILLYN 

= 111 I hi 1= = I : I = = I III II I 

SAGNSG- -VAGSTADGNPVNNTGTSELSTVGTPGVTPDALTVASAENSK 

390 400 410 420 



1686 1716 1746 1776 1806 

NAKVKDDLGSQLLVESEAAKFNI ARITRSTYNNI KNNSNKI ITILTEROA 

I == I = =1 1= = =1 = == 1 I = 
OTTDWKTJELGGVTFSSNSELKG-AAQvTTQLESNYSVLTKKLKLVDMGLGGADDYT FWLKQQ 

430 440 450 460 470 480 

1824 1854 1884 1914 1944 1974 2004 

IDNSLAGQLSSYSSWGPTPDLRLKPEITAPGGHIFSTVEDNQYADKSGTSMAAPQVAGAAAVLKQY 

llll ||::| ::||||lhl =111111111 hi Ihl lllllhl llh h= I 
KKA^SRLKFGTALIDNSRAGKMSDFTSWGPTPELDFKPEITAPGGKIYSLANDNKYQQMSGTSMASPFVAGSEALILQG 
570 580 590 600 610 620 630 
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2058 2088 2115 2145 2175 2205 2235 

ITDKKIPV--DNAADFIKLLLMNTAQPIINKQ-SKDGKTPYFWQQGSGAM^^ 

| : : : : | | |||::|: : : :|: :| |:|||| :|: h |l I III h 

IKKQGLNLSGEELVQFAKNSAMNTSHPVYDTEHTKEI ISP- - -RRQGSGEINVKDAINNTV- -EVKAANGNGA- - -AALK 
S50 660 670 680 690 700 

2265 2295 2349 2379 2409 2439 2469 

ELKEKKFKARILLRNFGKTNKTYIISSEA- -IADPVDEKGFRTQNSEHLVSKKADAVTRKVTVEAGKTI^VDLDVDySDA 
|= :: :: | | || :|| : : : | ::: :| : | ||||: |:: ||: : 

EIG-RQTTFKVTLTNHGKKAQTYAVDNYGGPYTQATEAKSGEIYDTK- IOTGQLTTETPKVTVQPGES-- VDVSFTLTLP 
720 730 740 750 760 770 780 

2499 2526 2556 2586 2616 2646 2676 2706 

ealtrimflegy™lk-dtegvadlhlpflgfygswteqkaidafegiseigngdkkrrvqfywketnktsstfttngm 

:: I ||:|||: :: : :| ||::||:||:: | :: | : | || : :: | : : : | 
YSFQRQNFVEGYVGFFAKDQATPNLVLPYMGFFGSYS-OASVSA-PMLYEGGNSNLIOTIHSLVGVMFSNNNDMLGHTGY 
800 810 820 830 840 850 

2724 2754 2781 2811 2841 2871 2901 2931 

LSLPIYNNTVFFSPNSP- FYDKAGVRIAALRNMEYVQYS IIDPDTNKEVRVLGRSHDVRKLYRLDYRNSFAMMPDS 

I : : III I I = II = =11 111=11 = II 

EGDDYSKYTDPDLIAISPNGDGSRDYAYPVLFFDRNYKEYTETITDAQGNK-VKSLGVGKEGTKDYYSSSSGEWTTHSLD 
870 880 890 900 910 920 930 

2961 2991 3021 3051 3081 3111 

IWDGKIKD*IAKGDKQYIYQIKVQLNNKGVGGDGVQIYQYYIKMDNNKPYLSPKDKTTVEKLEDRWK 

III I I llll II:: HI I : I : I : I :| I II : I 

KWDGTDADGQWKDGQYIY- - KVEFT- PAIGGQE - QELNI PVKVDSQAPEVSDLQVTKDGKLRLKAKDSGSGLDMTMFVA 
950 960 970 980 990 1000 1010 

3159 3189 3219 3249 

KITFKVQDTGIGLKDVYLQSVKYVGGGNNNLDLITPPGFKK 

I: II II 

AVNGEEQ VDGKSWTKLDKDTVQVAENGKVEFKYQDWGNESKVTTYEVKNIVKEVAAQPELKLTPDGEGKVKAELA 

1520 1530 1540 1550 1560 1570 1580, 

SEQ ID 8768 (GBS362N) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
cell extract is shown in Figure 149 (lane 10; MW 63.5kDa). It was also expressed in E.coli as a His-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 182 (lane 9; MW 38kDa) and in Figure 
149 (lane 11 & 12; MW 38kDa). Purified GBS362N is shown in Figure 235, lanes 3 & 4 

GBS362C was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 149 (lane 14-16; MW 91kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 155 (lane 18; MW 66.3kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1241 

A DNA sequence (GBSxl318) was identified in S.agalactiae <SEQ ID 3841> which encodes the amino 
acid sequence <SEQ ID 3842>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -4.04 Transmembrane 21 - 37 ( 17 - 38) 

Final Results 

bacterial membrane Certainty=0 .2614 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA95000 GB:AB042239 PAa [Streptococcus criceti] 
Identities = 55/166 (33%) , Positives = 81/166 (48%) , Gaps = 24/166 (14%) 

5 Query: 5 KKTDKFGFRKSKVCRSLCGALLGTVAWSIATASTEIHADEATTSPTTVTKVPQPVQADT 64 
K+ + FGFRKSK+ +SLCGALLGT WS+ A A++ TTS T+ DT 
Sbjct: 2 KRKETFGFRKSKISKSLCGALLGTAIWSV--AGQRALAEDMTTSTTSA VDT 51 

Query: 65 TALNTSKTHSTQATTTPVEAKENKWKSETVQSESRV- -MPRD - KWERPETVKAS VNS - 120 
10 TA+ ++T + +A + ++ Q+E + MP D EE VK++ + 

Sbjct: 52 TAWGTETGNPATNLPEKQADSSSQAEASQAQAEQKTGSMPVDVATTELDEAVKSAAEAG 111 

Query: 121 -DVSQPITTTPPTI NEKTVEIPNLAQDTKKVAPKVTVTPE 159 

VSQ T T+ +EK+ EI D K A + +T E 

15 Sbjct: 112 VTVSQDETVDKGTVGTSQEADEKSGEI KADYSKQAET I KITTE 154 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 3842 (GBS222) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 44 (lane 6; MW 22kDa). 

20 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1242 

A DNA sequence (GBSxl319) was identified in S.agalactiae <SEQ ID 3843> which encodes the amino 
acid sequence <SEQ ID 3844>. This protein is predicted to be CylK. Analysis of this protein sequence 
25 reveals the following: 
Possible site: 23 

>>> Seems to have no N-tertninal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0 . 3738 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

35 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1243 

A DNA sequence (GBSxl320) was identified in S.agalactiae <SEQ ID 3845> which encodes the amino 
acid sequence <SEQ ID 3846>. This protein is predicted to be CylJ. Analysis of this protein sequence 
40 reveals the following: 

Possible site: 20 

»> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0 . 1143 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9689> which encodes amino acid sequence <SEQ ID 9690> 
50 was also identified. 
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No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1244 

5 A DNA sequence (GBSxl321) was identified in S.agalactiae <SEQ ID 3847> which encodes the amino 
acid sequence <SEQ ID 3848>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 0913 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

15 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1245 

20 A DNA sequence (GBSxl322) was identified in S.agalactiae <SEQ ID 3849> which encodes the amino 
acid sequence <SEQ ID 3850>. This protein is predicted to be Cyll (fabF). Analysis of this protein 
sequence reveals the following: 

Possible site: 24 

>>> Seems to have an uncleavable N-term signal seq 
25 INTEGRAL Likelihood = -2.39 Transmembrane 721 

INTEGRAL Likelihood = -1.97 Transmembrane 326 
INTEGRAL Likelihood = -0.43 Transmembrane 534 

Final Results 

30 bacterial membrane Certainty=0 . 1956 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9687> which encodes amino acid sequence <SEQ ID 9688> 
35 was also identified. 

There is also homology to SEQ ID 3852. 

A related GBS gene <SEQ ID 8769> and protein <SEQ ID 8770> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
40 McG: Discrim Score: 1.08 

GvH: Signal Score (-7.5): -5.97 

Possible site: 24 
>» Seems to have an uncleavable N-term signal seq 
ALOM program count: 3 value: -2.39 threshold: 0.0 
45 INTEGRAL Likelihood = -2.39 Transmembrane 712 - 728 ( 712 - 729) 

INTEGRAL Likelihood = -1.97 Transmembrane 317 - 333 ( 317 - 334) 
PERIPHERAL Likelihood = 3.45 492 
modified ALOM score: 0.98 



- 737 ( 721 - 738) 

- 342 ( 326 - 343) 

- 550 ( 534 - 550) 
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*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 . 1956 (Affirmative) < suco 

5 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 8770 (GBS361) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 73 (lane 4; MW 84kDa). 

10 GBS361-His was purified as shown in Figure 213, lane 5. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1246 

A DNA sequence (GBSxl323) was identified in S.agalactiae <SEQ ID 3853> which encodes the amino 
15 acid sequence <SEQ ID 3854>. This protein is predicted to be CylF. Analysis of this protein sequence 
reveals the following: 

Possible site: 44 

»> Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0 . 3766 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1247 

A DNA sequence (GBSxl324) was identified in S.agalactiae <SEQ ID 3855> which encodes the amino 
30 acid sequence <SEQ ID 3856>. This protein is predicted to be CylE. Analysis of this protein sequence 
reveals the following: 

Possible site: 23 

»> Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0. 3498 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty^O. 0000 (Not Clear) < suco 

40 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1248 

A DNA sequence (GBSxl325) was identified in S.agalactiae <SEQ ID 3857> which encodes the amino 
45 acid sequence <SEQ ID 3858>. This protein is predicted to be ABC transporter homolog CylB. Analysis of 
this protein sequence reveals the following: 
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Possible site: 56 



25 



30 



>>> Seems to 


have an uncleavable N-term signal seq 










• INTEGRAL 


Likelihood =-13.90 


Transmembrane 


271 


- 287 


( 263 


- 291) 


INTEGRAL 


Likelihood =-10.30 


Transmembrane 


17 


- 33 


( 14 


- 43) 


INTEGRAL 


Likelihood = -8.60 


Transmembrane 


114 


- 130 


( 106 


- 138) 


INTEGRAL 


Likelihood = -6.69 


Transmembrane 


152 


- 168 


( 149 


- 178) 


INTEGRAL 


Likelihood = -1.97 


Transmembrane 


186 


- 202 


( 185 


- 202) 



Final Results 

10 bacterial membrane Certainty=0. 6562 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9685> which encodes amino acid sequence <SEQ ID 9686> 
15 was also identified. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1249 

20 A DNA sequence (GBSxl326) was identified in S.agalactiae <SEQ ID 3859> which encodes the amino 
acid sequence <SEQ ID 3860>. This protein is predicted to be ABC transporter homolog CylA. Analysis of 
this protein sequence reveals the following: 



Possible site: 57 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 4122 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9683> which encodes amino acid sequence <SEQ ID 9684> 
was also identified. A further related GBS gene <SEQ ID 8771> and protein <SEQ ID 8772> were also 
identified. Analysis of this protein sequence reveals homology to membrane protein ABC transporters. 

A further related DNA sequence was identified in S.pyogenes <SEQ ID 9085> which encodes the amino 
35 acid sequence <SEQ ID 9086>. An alignment of the GAS and GBS sequences follows: 

Score = 85.4 bits (208), Expect = le-18 

Identities = 68/271 (25%) , Positives = 129/271 (47%) , Gaps = 17/271 (6%) 

Query: 39 KGFTEQHVLKDINFDVYKGDFFGIVGRNGSGKSTLLKI ISQIYVPEKGQVT- -VDGKMVS 96 
40 K + L+DIN +G F+G++G NG+GK+TL ++ Q + G + VDGK +S 

Sbjct: 10 KKYGSFEALRDINLIFEEGKFYGLI^PNGAGKTTLFNLLIQNFKQTSGDIKWEVDGKPLS 69 

Query: 97 FIELGVGF NPELTGRENVYMNGAMLGFTKDEVDDMYNDIVDFAELHHFMNQ 147 

+ +G+ F + LT EN+ GA+ G +K +V + D+ + ++ Q 

45 Sbjct: 70 IKDFYRHIGIVFQSNRLDDNLTVEENLISRGALYGLSKSQVRNRLKDLQTYLDITAIKKQ 129 

Query: 148 KLKNYSSGMQ VRLAFSVAI KAQGDVLI LDE VLAVGDEAFQRKCNDYFME - RKDSGKTTI L 206 

K + SG+++ +A+ Q +L+LDE D +R D + + S T +L 

Sbjct: 130 KYGSLSGGQKRKVDIARALLPQPSLLLI£)EPTTGLDPQSRRDLVroAIAQLNQQSQMTVVL 189 

50 

Query: 207 VTHDMGAVKKYCMRAvLIEDGLVKAYGEPFDVANQYSvDNTETA-EDAMNAEKISVSDIA 265 

+TH + + C+ ++ +G + G+ Q+S N + + +++S++D 

Sbjct: 190 ITHYLEEMSA-CDVLNVLIEGNIYYSGDIKSFIEQHSTTNLNVVLKPEKSLDQLSIADFV 248 

55 Query: 266 KDLKVSLISNPRITPNDTITFEVSYEVLKDD 296 
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K ++S I D 1+ E +V+ D+ 
Sbjct: 249 N- -KCQVLSEREIVFKD- ISVEEMMQVISDN 276 

There is also homology to SEQ IDs 358, 482, 644, 686, 1832, 2529, 2720, 3882, 4028, 4104, 4280, 5090, 
5 5498, 6034, 6500. 

SEQ ID 8772 (GBS83) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 20 (lane 2; MW 37.6kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 21 (lane 5; MW 62.6kDa) and in Figure 
28 (lane 3; MW 62.6kDa). 

1 0 GBS83-GST was purified as shown in Figure 1 95, lane 6. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1250 

A DNA sequence (GBSxl327) was identified in S.agalactiae <SEQ ID 3861> which encodes the amino 
15 acid sequence <SEQ ID 3862>. This protein is predicted to be acyl carrier protein homolog AcpC. Analysis 
of this protein sequence reveals the following: 
Possible site: 56 

>>> Seems to have no N- terminal signal sequence 

20 Final Results 

bacterial cytoplasm — Certainty=0. 3451 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1251 

A DNA sequence (GBSxl328) was identified in S.agalactiae <SEQ ID 3863> which encodes the amino 
30 acid sequence <SEQ ID 3864>. This protein is predicted to be CylG (fabG). Analysis of this protein 
sequence reveals the following: 

Possible site: 39 

>>> Seems to have no N- terminal signal sequence 

35 ' Final Results 

bacterial cytoplasm Certainty=0. 2651 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 There is also homology to SEQ ID 3866. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1252 

A DNA sequence (GBSxl329) was identified in S.agalactiae <SEQ ID 3867> which encodes the amino 
acid sequence <SEQ ID 3868>. This protein is predicted to be CylD. Analysis of this protein sequence 
reveals the following: 

5 Possible site: 60 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2030 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 1253 

A DNA sequence (GBSxl330) was identified in S.agalactiae <SEQ ID 3869> which encodes the amino 

acid sequence <SEQ ID 3870>. Analysis of this protein sequence reveals the following: 

Possible site: 14 
20 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3219 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

25 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
30 vaccines or diagnostics. 

Example 1254 

A DNA sequence (GBSxl331) was identified in S.agalactiae <SEQ ID 3871> which encodes the amino 
acid sequence <SEQ ID 3872>. Analysis of this protein sequence reveals the following: 

Possible site: 56 
35 »> Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood = 


-8. 


.97 


Transmembrane 


231 


- 247 


( 226 


- 251) 


INTEGRAL 


Likelihood = 


-7. 


.06 


Transmembrane 


141 


- 157 


( 134 


- 164) 


INTEGRAL 


Likelihood = 


-2. 


.76 


Transmembrane 


28 


- 44 


( 26 


- 44) 


INTEGRAL 


Likelihood = 


-1. 


.38 


Transmembrane 


123 


- 139 


( 121 


- 139) 


INTEGRAL 


Likelihood = 


-0. 


.32 


Transmembrane 


199 


- 215 


( 199 


- 215) 



40 

Final Results 

bacterial membrane Certainty=0. 4588 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB88836 GB:AL353832 putative integral membrane transport 
protein. [Streptomyces coelicolor A3 (2) ] 
50 Identities = 68/264 (25%) , Positives = 123/264 (45%) , Gaps = 10/264 (3%) 
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Ouerv 6 RMHFIFIKQYMKQIMEYKIDFFVGVICVFLTQGIJSILLFLNVLFQHIPSLEGWTFQQIAFI 65 

R + + +++ M Y+ F + G F L+ + + ++F + +L G++ ++AF+ 
Sbjct: 34 RAYGLIAGMWIRSTimYRTSFALTAFGNFAMTALDFVAILLMFSRVDALGGYSLPEVAFL 93 

Query 66 YGFSLLPKGIDHLFFDNLWALGQRLIRKGEFDKYLTRPISPLFHVLVETFQVDALGELLV 125 

YG S + G+ L ++ LG+R +R G D L RP L V + F + LG ++ 
Sbjct: 94 YGLSGVSFGLADLAIGS^RLGRR-WDGTLDTIiLTOPAPVLAQVARDRFALRRLGRWQ 152 

Query 126 GF ILL - - STTVSS I S WTVPKVLLFI FI I PFATLIYTSLKIATSS IAFWTKQSGAVT YI F - 182 

G ++L + V I m KVLL + i+ ++ +A + F + + V F 

Sbjct: 153 GLLVLGYALVWDIDWTAAKVLLLPVALISGAGIFCAVFVAAGAFQFAAQDASEVANAFT 212 

Query 183 YMFNDFAKYPVAIYNNLLRWI I SFVIPFAFTAYYPAAYFLQDRNVYFNIGGVI LI 237 

Y +YP ++ L +FV+P AF + PA+Y L R ++ G + L 

Sbjct: 213 YGGTTMLQYPPTVFALDLWGATFVLPIAFVNWLPASYVL-GRPYPLDLPGWVAFTPPLA 271 

Query: 238 SLISFMVSLILWHKGVEVYESAGS 261 

+ ++ + W G+ Y S GS 

Sbjct: 272 AARCCALAGLAWRAGLRSYRSTGS 295 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3873> which encodes the amino acid 
sequence <SEQ ID 3874>. Analysis of this protein sequence reveals the following: 

Possible site: 49 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.86 Transmembrane 227 - 243 ( 225 - 251 

INTEGRAL Likelihood = -7.22 Transmembrane 141 - 157 ( 133 - 164 

INTEGRAL Likelihood = -6.37 Transmembrane 123 - 139 ( 114 - 140 

INTEGRAL Likelihood = -2.97 Transmembrane 26 - 42 ( 26 - 49) 

Final Results 

bacterial membrane — Certainty=0. 4545 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB88836 GB:AL353832 putative integral membrane transport 
protein. [Streptomyces coelicolor A3 (2) ] 
Identities = 69/262 (26%) , Positives = 125/262 (47%) , Gaps = 10/262 (3%) 

Ouerv 8 hAIFIKQYLKQI^YKVDEWGVLGVFLTQGLNLLFLSVLFQHIPSLEGWTFEQIAFIYG 67 

+ + +++ M Y+ F + G F L+ + + ++F + +L G++ ++AF+YG 
Sbjct: 36 YGLIAGMWIRSTMAYRTSFALTAFGNFAMTALDFVAILLMFSRVDALGGYSLPEVAFLYG 95 

Query 68 FCLIPKGIDHLFFDNLWALGQRLVRKGEFDKYLTRPISPLFHVLVETFQVDALGELLVGV 127 

+ G+ L ++ LG+R VR G D L RP L V + F + LG ++ G+ 
Sbjct: 96 LSGVSFGLADLAIGSMERLGRR-VRDGTLDTLLWPAPVLAQVAADRFALRRLGRWQGL 154 

Query 128 ILL- -VTTAGSI VWTLPKVLLFILVIPFATLIYTSLKIATASISFWTKQSGAVIYIF-YM 184 

++L I WT KVLL + + I++++A + F+ +V FY 

Sbjct: 155 LVLGYALVVVDIDWTAAKVLLLPVALISGAGIFCAVFVAAGAFQFAAQDASEVANAFTYG 214 

Query 185 FNDFSKYPMSIYHSFLRWLISFIIPFAFTAYYPASYFLTGQHLLFNIGGLV WSL 239 

+YP +++ L +F++P AF + PASY L G+ ++ G V + + 

Sbjct: 215 GTTMLQYPPOTFALDLVRGATFVLPLAFVNWLPASYVL-GRPYPLDLPGWVAFTPPLAAA 273 

Query: 240 LVLALSLKLWKWGLDAYESAGS 261 

AL+ W+ GL +Y S GS 
Sbjct: 274 ACCALAGLAWRAGLRSYRSTGS 295 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 208/261 (79%) , Positives = 238/261 (90%) 

rKYQR^FIFIKQYMKQI^YKIDFFVGVIi^ 60 



Query: 1 



MTi 
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M K + MR TFTKOY+KOIMEYK+DF VGVLGVFLTOGLNLLFL+VLFOHIPSLEGWTF+ 




Sbjct: 


1 


MAKLRCMHAI FIKQYLKQIMEYKVDFWGVLGWLTQGIiNLLFLSVLFQHI PSLEGWTFE 


60 


Query: 


61 


QIAFIYGFSLLPKGIDHLFFDNLWALGQRLIRKGEFDKYLTRPISPLFHVLVETFQVDAL 


120 










Sbjct: 


61 


QIAFIYGFCLIPKGIDHLFFDNLWALGQRLWKGEFDKYLTRPISPLFHVLVETFQVDAL 


120 


Query: 


121 


GELLVGFILLSTTVSSISWTVPKVLLFIFIIPFATLIYTSLKIATSSIAFWTKQSGAVIY 


180 






mTT.T.Vn TT,T. TT WT4-PTCVTJ.RT +TPFATT1TYTSLKIAT+SI+FWTKOSGAVTY 




Sb j ct : 


121 


GELLVGVILLVTTAGSIVWTLPKVLLFILVIPFATLIYTSLKIATASISFWTKQSGAVIY 


180 


Query: 


181 


I FYMFNDFAKYPVAI YNNLLRWI I S FVI PFAFTAYYPAAYFLQDRNVYFNIGGVI LI SLI 


240 






IFYMFNDF+KYP++IY++ LRW+ISF+IPFAFTAYYPA+YFL +++ FNIGG++++SL+ 




Sb j ct : 


181 


IFYMFNDFSKYPMSIYHSFLRWLISFIIPFAFTAYYPASYFLTGQHLLFNIGGLWVSLL 


240 


Query: 


241 


SFMVSLILWHKGVEVYESAGS 261 








+SL LW G++ YESAGS 




Sb j ct : 


241 


VLALSLKLWKWGLDAYESAGS 261 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1255 

A DNA sequence (GBSxl332) was identified in S.agalactiae <SEQ ID 3875> which encodes the amino 
acid sequence <SEQ ID 3876>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

»> Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood 




•15. 


.60 


Transmembrane 


147 


- 163 


( 


134 


- 178) 


INTEGRAL 


Likelihood 




-8. 


.55 


Transmembrane 


119 


- 135 


( 


114 


- 141) 


INTEGRAL 


Likelihood 




-7. 


.86 


Transmembrane 


238 


- 254 


( 


235 


- 260) 


INTEGRAL 


Likelihood 




-1. 


.70 


Transmembrane 


215 


- 231 


( 


212 


- 231) 


INTEGRAL 


Likelihood 




-1. 


.06 


Transmembrane 


61 


- 77 


( 


61 


- 77) 


INTEGRAL 


Likelihood 




-0. 


.22 


Transmembrane 


27 


- 43 


( 


27 


- 43) 



Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



Certainty=0. 7241 (Affirmative) < suco 
-- Certainty=0. 0000 (Not Clear) < suco 
-- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

?GP:CAB88837 GB:AL353832 putative integral membrane protein. 
[Streptomyces coelicolor A3 (2) ] 
Identities = 60/271 (22%), Positives = 118/271 (43%) , Gaps = 13/271 (4%) 

Query: 6 RRYKPFISTGIQGLITYRVDFILYRIGDVIGAFVAFYLWKAVFDSSSQSLIQGFQLSDMI 65 

R Y + G + TYR + + + Y + A++D Q + G+ + + 

Sbjct: 7 RLWAVAAGGFRRYATYRAATAAGVFTNTVFGLILVYTYLALWDEKPQ--LGGYDQAQAV 64 

Query: 66 LYI IMS - FVTNLLTRTDSSFM- - IGDEVKDGS I IMRLLRPVHFAASYLFMEIGSRWLI FL 122 

++ + + L F + + ++ G + + L RP +L ++G L 

Sbjct: 65 TFVTOjGQALLAAI^IGGGGFEDELMERIRTGDVAVDLYRPADLQLWWLAADVGRAVFQLL 124 

Query: 123 SIGV-PFLLVITGVRLFLGTDLIQAIVLWFYIISIILAFLINFFFNICFGFSAFVFKNL 181 

GV PF+ LF L + + + ++++++LA ++ F SAF + 

Sbjct: 125 GRGWPFVFG SLFFPVALPREVSWAAFLTOvvIjaWVGFALRYLVALSAFWLLDG 180 

Query: 182 WGSNLLKNSLVAFMSGSLIPLTFFPKIVADILGFLPFSSLIYTPVMIIIGKYDGSQIVQA 241 

G + F SG L+PL FP ++ D++ LP+SSL+ P +++G+ D + 
Sbjct: 181 TGVTQMAWIAGLFCSGMLLPLNVFPGVLGDVVRALPWSSLLC^PADVLLGEADP LGT 237 

Query: 242 LLLQI FWLI VMVALSQLIWKKVQLHITIQGG 272 

L Q W + ++AL +L+ + +QGG 

Sbjct: 238 YLFQASWAVALLALGRLVQSAATRRWVQGG 268 



WO 02/34771 



-1398- 



PCT/GB01/04789 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3877> which encodes the amino acid 
sequence <SEQ ID 3878>. Analysis of this protein sequence reveals the following: 



Possible site: 50 
>>> Seems to have no N- terminal signal sequence 
INTEGRAL Likelihood = -9.18 Transmembrane 
INTEGRAL Likelihood = -7.22 Transmembrane 
INTEGRAL Likelihood = -6.10 Transmembrane 



252 - 268 ( 248 - 277) 
161 - 177 ( 151 - 187) 
133 - 149 ( 128 - 160) 



INTEGRAL 



Likelihood = -2.81 Transmembrane 213 - 229 ( 211 - 230) 



Final Results 

bacterial membrane -- 
bacterial outside -- 
bacterial cytoplasm -- 



Certainty=0 .4673 (Affirmative) < suco 
Certainty=0.0000 (Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:AAF11144 GB:AE002002 conserved hypothetical protein [Deinococcus radiodurans] 
Identities = 56/268 (20%) , Positives = 113/268 (41%) , Gaps = 21/268 (7%) 

Query: 15 MWSFWKRYRPFLSAGIQELITYRVNFFLYRIGDVMGAFVAYYLWKAVFDSSKQSLINGFT 74 

M +FW++ R + + + YR ++ + + V +W S+ ING+T 

Sbjct: 1 MTNFWRKVKVLWAVSIiASTLEYRAETIIWMLSGTIiN-LvT«LVWMTQAKSAPGGQINGYT 59 

Query: 75 LSDMTFYIIMSFVTTLLTKSDSSFMIGEEVKDGSIIMRLLRPV HFAASYLFMEIG 129 

Y + +++ + L + + +++ G++ LL P+ FAA + 

Sbjct: 60 PQAFAGYFLATWLVSQLLWWVGWELDYKIRQGTLSPELLHPIDPLWREFAAH- -LTDKA 117 

Query: 130 FRWIVIiMSVGFPFLMVLSGIKVMAGLSILQVLASSCLYLVSLLIiAFL INFYFNICFG 186 

FR P ++VL + +AL+Q+ Y L LA L +F+ G 

Sbjct: 118 FR LP IMLVL - - LLI FARLTGAQFTSQWWAYPAVLGLALLGLCVRFLWEYTIjG 167 

Query: 187 SSAFVFKNLWGSNLLKNALVAFMSGSLIPLAFFPKMVSIVLSFLPFSSLVYTPVMIVIGK 246 

AF ++ + AG PL+F+P + + ++ PF ++ P ++ GK 

Sbjct: 168 LIAFVn'ESSSSFGEVLWLFYAAFGGMFAPLSFYPGWLQ/TLAAWTPFPYMLGLPAALLAGK 227 

Query: 247 YSLSQIMVALSLQIFWLLvMWLSQVIW 274 

S ++ + + + WL VM ++ + +W 
Sbjct: 228 ASGAEALRGAGVLLGWLAVMWLVRRWVW 255 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 199/268 (74%) , Positives = 236/268 (87%) 

Query: 5 WRRYKPFISTGIQGLITYRVDFILYRIGDVIGAFVAFYLWKAVFDSSSQSLIQGFQLSDM 64 

W+RY+PF+S GIQ LITYRV+F LYRIGDV+GAFVA+YLWKAVFDSS QSLI GF LSDM 
Sbjct: 19 WKRYRPFLSAGIQELITYRVNFFLYRIGDVMGAFVAYYLWKAVFDSSKQSLINGFTLSDM 78 

Query: 65 ILYIIMSFVTNLLTRTDSSFMIGDEVKDGSIIMRLLRPVHFAASYLFMEIGSRWLIFLSI 124 

YIIMSFVT LLT++DSSFMIG+EVKDGSIIMRLLRPVHFAASYLFMEIG RW++ +S+ 
Sbjct: 79 TFYIIMSFVTTLLTKSDSSFMIGEEVTOGSIIMRLLRPVHFAASYLFMEIGFRWIVLMSV 138 

Query: 125 GVPFLLVITGVRLFLGTDLIQAIVLWFYIISIILAFLINFFFNICFGFSAFVFKNLWGS 184 

G PFL+V++G+++ G ++Q + Y++S++LAFLINF+FNICFG SAFVFKNLWGS 

Sbjct: 139 GFPFLMVLSGIKVMAGLSILQVLASSCLYLVSLLLAFLINFYFNICFGSSAFVFKNLWGS 198 

Query: 185 NLLKNSL VAFMSGSLI PLTFFPKIVADILGFLPFSSLIYTPVMI I IGKYDGSQIVQALLL 244 

NLLKN+LVAFMSGSLIPL FFPK+V+ +L FLPFSSL+YTPVMI+IGKY SQI+ AL L 
Sbjct: 199 NLLKNALVAFMSGSLIPLAFFPKMVSIVLSFLPFSSLVYTPVMIVIGKYSLSQIMVALSL 258 

Query: 245 QIFWLIVMVALSQLIWKKVQLHITIQGG 272 

QIFWL+VMV LSQ+IWKKVQ H+TIQGG 
Sbjct: 259 QIFWLLVMWLSQVIWKKVQYHLTIQGG 286 
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Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1256 

A DNA sequence (GBSxl333) was identified in S.agalactiae <SEQ ID 3879> which encodes the amino 
5 acid sequence <SEQ ID 3880>. This protein is predicted to be ABC transporter, ATP-binding protein. 
Analysis of this protein sequence reveals the following: 

Possible site: 31 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 2013 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 A related GBS nucleic acid sequence <SEQ ID 9681> which encodes amino acid sequence <SEQ ID 9682> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF09790 GB:AE001882 ABC transporter, ATP-binding protein 
[Deinococcus radiodurans] 
20 Identities = 141/331 (42%) , Positives = 201/331 (60%) , Gaps = 34/331 (10%) 

Query: 10 MIEVSHLQKNFIKTVKAPGLKGAFQSFLRPEKHTFEAVKDLTFDVPKGQILGFIGANGAG 69 

MIEV HL K+F + AV+D++F +P G+I+G++G NGAG 

Sbjct: 46 MIEVRHLCKSFARK PAVQDISFSIPAGEIVGYLGPNGAG 84 

25 

Query: 70 KSTT I KMLTGI LKPTSGFCRIDGKLPQENRQNTVKDIGWFGQRTQLWWDLALQETYTVL 129 

KSTTIK+LTG+L P SG R+ G +P + R+ +V +G VFGQRT LWWDL ++E+ +L 
Sbjct: 85 KSTTIKVLTGLLVPDSGEVRVGGLVPWKQRRQHVARLGAVFGQRTTLWWDLPVRESLELL 144 

30 Query: 130 KEIYDVPDKEFRKRMAFLNEVLELNDFIKDPTOTLSLGQRMRADIAASLLHNPKVLFLDE 189 

+ +Y VP F + +A E+LEL F+ PR LSLGQRMRAD+AA+LLH+P++LFLDE 
Sbjct: 145 RHVYRVPAARFAENIiAGFTELLELGPFLNTPARAIiSLGQRMRADLAAALLHDPELLFLDE 204 

Query: 190 PTIGLDVSVKDNIRRAITQINQEEETTILLTTHDLSDIEQLCHRIFMIDRGQEIFDGTVS 249 
35 PT+GLDV K+ IR + +N E T+LLTTHDL D+E+L R+ MID G+ +FDG ++ 

Sbjct: 205 PWGLDWAKERIREFVKAWAERGVTVLLTTHDLGDWRLARRVMMIDTGRLLFDGPLA 264 

Query: 250 QLKETFGKMKTL- -SFDLRPGQEHISS-SLIGKSEINIKRNDLVLDIQYDSSRYQTADII 306 
+L+ +G + L F+ P Q + +L+G+ ++ Y S A I 

40 Sbjct: 265 ELQARYGGERELWVEFEKAPAQPALPGLTLLGQDGPRVR YGFSGAAAAPIA 315 

Query: 307 QQTLADFSVRDLKMTDAD I EDI I RRFYRNEL 337 

Q T A VRDL + + ++E IRR Y L 
Sbjct: 316 QVT-ALAPVRDLAVKEPEVEATIRRIYEGNL 345 



45 



50 



55 



A related DNA sequence was identified in S.pyogenes <SEQ ID 388 1> which encodes the amino acid 
sequence <SEQ ID 3882>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3315 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 272/330 (82%) , Positives = 305/330 (92%) 
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Query: 8 MSMIEVSHLQKNFIKTVKAPGLKGAFQSFLRPEKHTFFJWKDLTFDVPKGQILGFIGANG 67 

M MIEVSHLQKNF KT+K PGLKGA +SF+ P + FEAVKDL+F+VPKGQILGFIGANG 
Sbjct: 1 MVMIEVSHLQICNFSKTIKEPGIiKGMjKSFVHPPREIFEAVKDLSFEVPKGQILGFIGANG 60 

5 

Query: 68 AGKSTTIKMLTGILKPTSGFCRIDGKLPQENRQNYVKDIGWFGQRTQLWWDLALQETYT 127 

AGKBTTIKMLTGILKPTSG+CRI+GK+PQ+NRQ YV+DIG WGQRTQLWWDLALQETY 
Sbjct: 61 AGKSTTIKMLTGILKPTSGYCRINGKIPQDNRQYYVRDIGAVFGQRTQLWWDLALQETYV 120 

10 Query: 128 VLKEIYDVPDKEFRKRmFI^VLEIJSroFIKDPWTLSLGQRMRADIAASLLHNPKVLFL 187 

VLKEIYDVP+K FRKRM FLNEVL+LN+FIKDPWTLSLGQRMRADIAASLLHNPKVLFL 
Sbjct: 121 VLKE I YDVPEKAFRKRMDFLNE VLDLNEFI KDPVRTLSLGQRMRAD IAASLLHNPKVLFL 180 

Query: 188 DEPTIGLDVSVKDNIRRAITQINQEEETTILLTTHDLSDIEQLCHRIFMIDRGQEIFDGT 247 
15 DEPTIGLDVSVKDNIRRMTQINQEEETTILLTTHDLSDIEQLC RI MID+GQEIFDGT 

Sbjct: 181 DEPTIGLDVSVKDNIRRAITQINQEEETTILLTTHDLSDIEQLCDRIIMIDKGQEIFDGT 240 

Query: 248 VSQLKETFGKMKTLSFDLRPGQEHISSSLIGKBEINIKRNDLVLDIQYDSSRYQTADIIQ 307 
V+QLK++FGKMK+LSF+L+PGQE + S +G +1 ++R++L LDIQYDSSRYQTADIIQ 
20 Sbjct: 241 VTQLKQSFGKMKSLSFELKPGQEQWSQFMGLPDITVERHELSLDIQYDSSRYQTADIIQ 300 

Query: 308 QTLADFSVRDLKMTDADIEDIIRRFYRNEL 337 

+T+ADF+VRD+KMTD DIEDI+RRFYR EL 
Sbjct: 301 KTMADFAVRDVKMTDVD I EDI VRRFYRKEL 330 

25 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1257 

A DNA sequence (GBSxl334) was identified in S.agalactiae <SEQ ID 3883> which encodes the amino 
30 acid sequence <SEQ ID 3884>. This protein is predicted to be Fmt. Analysis of this protein sequence 
reveals the following: 

Possible site: 32 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -9.39 Transmembrane 21 - 37 ( 8 - 39) 
35 INTEGRAL Likelihood = -7.75 Transmembrane 360 - 376 ( 359 - 381) 

Final Results 

bacterial membrane Certainty=0 . 4758 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8775> which encodes amino acid sequence <SEQ ID 8776> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
45 McG: Discrim Score: 8.85 

GvH: Signal Score (-7.5): -3.75 

Possible site: 25 
>>> Seems to have an uncleavable N-term signal seq 
ALOM program count: 2 value: -9.39 threshold: 0.0 
50 INTEGRAL Likelihood = -9.39 Transmembrane 21 - 37 ( 8 - 39) 

INTEGRAL Likelihood = -7.75 Transmembrane 353 - 369 (352 - 374) 
PERIPHERAL Likelihood =4.24 92 
modified ALOM score: 2.38 

55 *** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 4758 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

60 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA24012 GB:AB009635 Fmt [Staphylococcus aureus] 
Identities = 72/279 (25%) , Positives = 125/279 (43%) , Gaps = 25/279 (8%) 

5 Query: 49 LHRFMRKMNVNGmiVSDNTGKPITISHGIITOGEvETDIEaSf--NKLFPMASLQKLMTGII 106 

+ ++++ + NG + + +N GK + +S G + E I+N N +F + S QK TG++ 
Sbjct: 79 IDKYLQSSLFNGSVAIYEN-GK-LKMSKGYGYQDFEKGIKNTPNTMFLIGSAQKFSTGLL 136 

Query: 107 IQRLIDQDVLSEDDRLSQFFPQWGSNSITIHQLLTHTSGLREKGVKVSPYLKNEREQLQ 166 
10 +++L ++ ++ +D +S++ P K S I + L+ H SGL + K S KN + ++ 

Sbjct: 137 LKQLEEEHKININDPVSKYLPWFKTSKPIPLKDIMLHQSGLYK--YKSSKDYKNLDQAVK 194 

Query: 167 FCLKHYNFVNK-KSWYYSNINFSFLTGIATQVTGRTYAELVDDVIKNPLRLDDTQSYQSV 225 
K K K Y++ N+ L + +VTG++YAE I +PL+L T Y 

15 Sbjct: 195 AIQKRGIDPKKYKKHMYlTOGNYLvIAKVIEEvTGKSYAENYYTKIGDPLKLQHTAFYD- - 252 

Query: 226 VNHDLVSPMRKNGKLNKINIF NQVSTAYGAGDFFTTPLNFWVLMRSFSKGYFFPT- 280 

+ K N + N + YGAG+ + TP + L+ + F 

Sbjct: 253 -EQPFKKYLAKGYAYNSTGLSFLRPNILDQYYGAGNLYMTPTDMGKMTQIQQYKLFSPK 311 

20 

Query: 281 DEYTKHQNDAISHYYGGLYMHGRIVNSNGTFF 312 

+ TK D Y G Y + NG FF 
Sbjct: 312 ITNPLLHEFGTKQYPD EYRYGFYAKPTLNRLNGGFF 347 

25 There is also homology to SEQ ID 3886. 

A related GBS gene <SEQ ID 8773> and protein <SEQ ID 8774> were also identified. Analysis of this 

protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 
McG: Discrim Score: 14.89 
30 GvH: Signal Score (-7.5): -3.75 

Possible site: 25 
»> Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -9.39 threshold: 0.0 

INTEGRAL Likelihood = -9.39 Transmembrane 14- 30( 1- 32) 
35 PERIPHERAL Likelihood =4.24 85 

modified ALOM score: 2.38 

*** Reasoning Step: 3 

40 Final Results 

bacterial membrane Certainty=0 .4758 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the databases: 

29.6/49.6% over 218aa 
Bacillus cereus 

GP | 4127525 | D-stereospecif ic peptide hydrolase Insert characterized 

50 ORF00162 (478 - 1083 of 1644) 

GP|4127525|emb|CAA09676.l| |AJ011526(67 - 285 of 389) D-stereospecif ic peptide hydrolase 
{Bacillus cereus} 
%Match =5.8 

%Identity =29.5 %Similarity =49.5 
55 Matches = 62 Mismatches = 96 Conservative Sub.s = 42 

330 360 390 420 450 480 510 540 

MILRRLFMVRKFLKSLLSLFLIAVIATGISVACFFFIPENKGNITPILLHRFMRKNNvNGMMIVSDNTGKPITISHGINR 
=1 :|: : : : | |:: : || : || 

60 T(2ASIiALLIAGSSLLYTTPTSIVKAEPTQNVSSSLQTNTQRDRTSVKQ 

20 30 40 50 60 70 80 



570 600 630 660 



705 735 753 
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GEVETDIENNKLFPMASLQKLMTGIIIQRLIDQDVLSEDDRLSQFFPQV KG--SNSITIHQLLTHTSGL REKG 

: :: : | : |: | | :|: :: | || : ::| | I I lll===l lllh I I 

LRTKKPMKTDFRFRIGSVTKTFTATVVLQLVGENRLKLDDHIEDWLPGVIQGNGYDGNKITIQEILNHTSGIAEYSRSKD 
100 110 120 130 140 150 160 

5 

807 834 864 894 924 954 978 

VKVS PYLKN- - EREQLQFCLKHY - NFWKKSWYYSNINFSFLTGIATQVTGRTYAELVDDVI KNPLRLDDT - - QS YQSW 

I = |: I == = =111 III = =1 = =111 =111 h= I II I =1 11 = 

VDFTDTKKSYTAEELVKMGISFPPDFAPGKGWSYSNTGYVLLGILIEKvTGNSYAEEV^ 
10 180 190 200 210 220 230 240 

993 1023 1053 1083 1113 1143 1173 1203 

NH--DLVSPMRKNGKI^KINIFKQVSTAYGAGDFFTTPLNFWVLMRSFSKGYFFPTDEYTKHQNDAISHYYGGLYMH 

II II :| : :| | | | | : | : : : : | : : : : | 

15 PGTNHARGYVQP-DGASELKDVTYYN-PSAGSSAGDMISTADDIJNKFFSYLLGGKLLKEQQLKQMLTTVPTGKEGIDGYG 
260 270 280 290 300 310 320 

SEQ ID 8776 (GBS61) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 33 (lane 3; MW 68kDa). 

20 GBS61-GST was purified as shown in Figure 195, lane 5. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1258 

A DNA sequence (GBSxl335) was identified in S.agalactiae <SEQ ID 3887> which encodes the amino 
25 acid sequence <SEQ ID 3888>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>>> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0 . 2398 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

35 No corresponding DNA sequence was identified in S. pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1259 

A DNA sequence (GBSxl336) was identified in S.agalactiae <SEQ ID 3889> which encodes the amino 
40 acid sequence <SEQ ID 3890>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -5.57 Transmembrane 16 - 32 ( 13 - 33) 

45 Final Results 

bacterial membrane Certainty=0. 3230 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



50 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1260 

A DNA sequence (GBSxl337) was identified in S.agalactiae <SEQ ID 3891> which encodes the amino 
5 acid sequence <SEQ ID 3892>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>>> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 3 910 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1261 

A DNA sequence (GBSxl338) was identified in S.agalactiae <SEQ ID 3893> which encodes the amino 
20 acid sequence <SEQ ID 3894>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>>> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0. 4 23 9 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

30 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1262 

A DNA sequence (GBSxl339) was identified in S.agalactiae <SEQ ID 3895> which encodes the amino 
35 acid sequence <SEQ ID 3896>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>>> Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0. 4349 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
45 No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 

Example 1263 

A DNA sequence (GBSxl340) was identified in S.agalactiae <SEQ ID 3897> which encodes the amino 
acid sequence <SEQ ID 3898>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4962 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 

Example 1264 

A DNA sequence (GBSxl341) was identified in S.agalactiae <SEQ ID 3899> which encodes the amino 
acid sequence <SEQ ID 3900>. Analysis of this protein sequence reveals the following: 
Possible site: 29 

»> Seems to have no N-terminal signal sequence 

Final Results 1 

bacterial cytoplasm Certainty=0 .4014 (Affirmative) < suco 

bacterial membrane ; — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG38044 GB:AF295925 Orf28 [Streptococcus pneumoniae] 
Identities = 23/35 (65%) , Positives = 28/35 (79%) 

Query: 9 LIHWEGNSGDKLIEHQTSATGWYYQVDRSFSQPKG 43 

L +WEGNSGDKL+E QT AT WYYQ+++ FSQ G 
Sbjct: 180 LTYWEGNSGDKLLERQTRATEWYYQIEKGFSQTNG 214 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1265 

A DNA sequence (GBSxl342) was identified in S.agalactiae <SEQ ID 3901> which encodes the amino 
acid sequence <SEQ ID 3902>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2036 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

Example 1266 

A DNA sequence (GBSxl343) was identified in S.agalactiae <SEQ ID 3903> which encodes the amino 
acid sequence <SEQ ID 3904>. Analysis of this protein sequence reveals the following: 

Possible site: 47 
10 >» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10933> which encodes amino acid sequence <SEQ ID 
10934> was also identified. 

SEQ ID 3904 (GBS153) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
20 extract is shown in Figure 25 (lane 3; MW 22kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 31 (lane 4; MW 47kDa). 

GBS153-GST was purified as shown in Figure 198, lane 4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

25 Example 1267 

A DNA sequence (GBSxl344) was identified in S.agalactiae <SEQ ID 3905> which encodes the amino 
acid sequence <SEQ ID 3906>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>>> Seems to have no N- terminal signal sequence 

30 

Final Results 

bacterial cytoplasm Certainty=0. 2036 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

40 Example 1268 

A DNA sequence (GBSxl345) was identified in S.agalactiae <SEQ ID 3907> which encodes the amino 
acid sequence <SEQ ID 3908>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
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>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2570 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA59773 GB:X85787 tasA [Streptococcus pneumoniae] 
Identities = 18/33 (54%) , Positives = 28/33 (84%) 

Query: 2 DVQSDENFAFKIFKVAKAKGLSLDVFDKLVGRF 34 

+ QSD+N F++FKV+K KG++LD FD+++GRF 
Sbjct: 320 EYQSDKNPFFEVFKVSKTKGIALDPFDEIIGRF 352 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3909> which encodes the amino acid 
sequence <SEQ ID 3910>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

>>> Seems to have no N-terminal signal sequence 



An alignment of the GAS and GBS proteins is shown below. 
Identities = 18/34 (52%) , Positives = 25/34 (72%) 

Query: 1 MDVQSDENFAFKI FKVAKAKGLSLDVFDKLVGRF 34 

+DVQSDE+F FK+ KV K+KG+ L+ D+ V F 
Sbjct: 31 LDVQSDEDFGFKWKVLKSKGIVLNALDESVCGF 64 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1269 

A DNA sequence (GBSxl346) was identified in S.agalactiae <SEQ ID 391 1> which encodes the amino 
acid sequence <SEQ ID 3912>. This protein is predicted to be a fimbria-associated protein. Analysis of this 
protein sequence reveals the following: 

Possible site: 52 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.17 Transmembrane 169 - 185 ( 168 - 185) 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC13546 GB:AF019629 putative fimbria-associated protein 
[Actinomyces naeslundii] 
Identities = 53/109 (48%) , Positives = 75/109 (68%) 

Query: 13 IPKINQDLPIYAGSEEDNLQRGVGHLEGISLPIGGASTHAVLSGQRGMPAARLFADLDKM 72 

IP 1+ DLP+Y G+ +D L +G+GHLEG SLP+GG T +V++G RG+ A +F +LDK+ 
Sbjct: 93 IPSISLDLPVYHGTADDTLLKGLGHLEGTSLPVGGEGTRSVITGHRGLAEATMFTNLDKV 152 



Final Results 



bacterial cytoplasm Certainty=0 .2405 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 1468 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



Query: 73 KKGDYFYVTNLKETLAYQVDRIMVIEPSQLDAVSIEEDKDYVTLLTCTP 121 
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K GD V EL Y+V V+EP + +A+ +EE KD +TL+TCTP 
Sbjct: 153 KTGDSLIVEVFGEVLTYRVTSTKWEPEETEALRVEEGKDLLTLVTCTP 201 

There is also homology to SEQ ID 3740 and to SEQ ID 3910. 

5 SEQ ID 3912 (GBS194) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 177 (lane 2; MW 24kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1270 

10 A DNA sequence (GBSxl347) was identified in S.agalactiae <SEQ ID 3913> which encodes the amino 
acid sequence <SEQ ID 3914>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -5.15 Transmembrane 880 - 896 ( 876 - 898) 
15 INTEGRAL Likelihood = -4.78 Transmembrane 24 - 40 ( 23 - 42) 

Final Results 

bacterial membrane Certainty=0 . 3060 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8777> which encodes amino acid sequence <SEQ ID 8778> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
25 SRCFLG: 0 

McG: Length of UR: 20 

Peak Value of UR: 2.80 
Net Charge of CR: 5 
McG: Discrim Score: 10.81 
30 GvH: Signal Score (-7.5): -3.76 

Possible site: 29 
»> Seems to have an uncleavable N-term signal seq 
Amino Acid Composition: calculated from 1 
ALOM program count: 2 value: -5.15 threshold: 0.0 
35 INTEGRAL Likelihood = -5.15 Transmembrane 867 - 883 ( 863 - 885) 

INTEGRAL Likelihood = -4.78 Transmembrane 11 - 27 ( 10 - 29) 
PERIPHERAL Likelihood = 7.58 531 
modified ALOM score: 1.53 
icml HYPID: 7 CFP: 0.306 



40 



*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 . 3060 (Affirmative) < suco 

45 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



LPXTG motif: 859-863 



50 No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 8778 (GBS 104) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 27 (lane 5; MW 95kDa). 

GBS104-His was purified as shown in Figure 221, lane 9-10. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1271 

A DNA sequence (GBSxl348) was identified in S.agalactiae <SEQ ID 3915> which encodes the amino 
5 acid sequence <SEQ ID 3916>. This protein is predicted to be a fimbria-associated protein. Analysis of this 
protein sequence reveals the following: 

Possible site: 40 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-15.28 Transmembrane 257 - 273 ( 252 - 280) 
10 INTEGRAL Likelihood = -7.11 Transmembrane 19 - 35 ( 16 - 39) 

Final Results 

bacterial membrane Certainty=0 . 7114 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC13546 GB:AF019629 putative fimbria-associated protein 
[Actinomyces naeslundii] 
20 Identities = 79/178 (44%) , Positives = 112/178 (62%) , Gaps = 7/178 (3%) 

Query: 65 RIALANAYNETLSRNPLL IDPFTSKQKEGLREYARMLEVHEQ- - IGHVAIPSIGV 117 

++ A+AYN+ LS +L + K+ +YA +L+ + + + + IPSI + 

Sbjct: 39 QVEQAHAYNDALSAGAVLEANNHVPTGAGSSKDSSLQYANILKANNEGLMARLKIPSISL 98 

25 

Query: 118 DIPIYAGTSETVLQKGSGHLEGTSLPVGGLSTHSVLTAHRGLPTARLFTDLNKVKKGQIF 177 

D+P+Y GT++ L KG GHLEGTSLPVGG T SV+T HRGL A +FT+L+KVK G 
Sbjct: 99 DLPVYHGTADDTLLKGLGHLEGTSLPVGGEGTRSVITGHRGLAEATMFTNLDKVKTGDSL 158 

30 Query: 178 YVTNIKETLAYKWSIKVVDPTALSEVKIVNGKDYITLLTCTPYMINSHRLLVKGERI 235 

V EL Y+V S KW+P +++ GKD +TL+TCTP IN+HR+L+ GERI 

Sbjct: 159 IvEVFGEVLTYRVTSTKVVEPEETEALRVEEGKDLLTLVTCTPLGINTHRILLTGERI 216 

There is also homology to SEQ ID 3740. 

35 SEQ ID 3916 (GBS208) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 44 (lane 5; MW 35kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 85 (lane 8; MW 59.7kDa) and in Figure 
160 (lane 5; MW 60kDa). 

GBS208-GST was purified as shown in Figure 224, lane 7-8. 

40 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1272 

A DNA sequence (GBSxl349) was identified in S.agalactiae <SEQ ID 3917> which encodes the amino 
acid sequence <SEQ ID 391 8>. This protein is predicted to be a fimbria-associated protein. Analysis of this 
45 protein sequence reveals the following: 

Possible site: 30 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -9.13 Transmembrane 265 - 281 ( 260 - 284) 

50 Final Results 
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bacterial membrane Certainty=0. 4652 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

5 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC13546 GB:AF019629 putative f imbria-associated protein 
[Actinomyces naeslundii] 
Identities = 96/265 (36%) , Positives = 150/265 (56%) , Gaps = 10/265 (3%) 

10 Query: 41 QASHANINAFKEAVTKIDRVEINRRLELAYAYNASI-AGAKTNGEYPALKDPYSAEQKQA 99 
Q + + + A A R+ ++E A+AYN ++ AGA PA + 

Sbjct: 15 QYNQSKOTADYSAQVDGARPDAKTQVEQAHAYNDALSAGAVLEANNHV PTGAGSSKD 71 

Query: 100 GWEYARMLEVKEQ--IGHVIIPRINQDIPIYAGSAEENLQRGVGHLEGTSLPVGGESTH 157 
15 ++YA +L+ + + + IP 1+ D+P+Y G+A++ L +G+GHLEGTSLPVGGE T 

Sbjct: 72 SSLQYANILKANNEGLMARLKIPSISLDLPVYHGTADDTLLKGLGHLEGTSLPVGGEGTR 131 

Query: 158 AVLTAHRGLPTAKLFTNLDKVTVGDRFYIEHIGGKIAYQVDQIKVIAPDQLEDLYVIQGE 217 
+V+T HRGL A +FTNLDKV GD +E G + Y+V KV+ P++ E L V +G+ 
20 Sbjct: 132 SVITGHRGLAFATMFTNLDKVKTGDSLIVEVFGEVLTYRVTSTKVVEPEETEALRVEEGK 191 

Query: 218 DHVTLLTCTPYMINSHRLLTOGKRI-PYWKWQKDSKTFRQQQYLTYAMWVWGLILLS 276 

D +TL+TCTP IN+HR+L+ G+RI P K + K + +A+ + GLI++ 

Sbjct: 192 DLLTLVTCTPLGINTHRILLTGERIYPTPAKDIAAAGKRPDVPHFPWWAVGLAAGLIWG 251 

25 

Query: 277 LLIW FKKTKQKKRRKNEKAASQ 298 

L +W + + K+R A+Q 
Sbjct: 252 LYLWRSGYAAARAKERALARARAAQ 276 

30 There is also homology to SEQ ID 3740. 

SEQ ID 3918 (GBS209) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 50 (lane 4; MW 62kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 85 (lane 3; MW 37.2kDa). 

GBS209-His was purified as shown in Figure 221, lane 8. 

35 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1273 

A DNA sequence (GBSxl350) was identified in S.agalactiae <SEQ ID 3919> which encodes the amino 
acid sequence <SEQ ID 3920>. Analysis of this protein sequence reveals the following: 

40 Possible site: 27 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -9.66 Transmembrane 281 - 297 ( 276 - 300) 

Final Results 

45 bacterial membrane Certainty=0 .4864 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



50 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04080 GB:AP001508 unknown [Bacillus halodurans] 
Identities = 45/141 (31%) , Positives = 63/141 (43%) , Gaps = 20/141 (14%) 



Query: 153 TGELDLLKVGVDGDTKKPLAGWFELYEKNGRTPIRVKNGVHSQDIDAAKHLETDSSGHI 212 
55 TG L++ KV D DT + L G F LY+ G IR LET G 

Sbjct: 1084 TGSLEVTKV- -DADTGE VLQGATFTLYDSEGEFAIRT LETGEDGKA 1127 
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Query: 213 RISGLIHGDYVLKEIETQSGYQIGQAETAVTIEKSKTVTVTIENKKVPTPKVPSRGGL-I 271 

L++GDY+LKE GY +G +T + VT+EN+K +V + G + + 

Sbjct: 1128 TFWLLYGDYLLKEDSAPEGYLVGINDTQROTIDTVLHEVTVENEKSDINRVSAVGAVQL 1187 

Query: 272 PKTGEQQAMALVIIGGILIAL 292 
K E+ +L G L AL 

Sbjct: 1188 QKVDEETGESL QGALFAL 1205 

Identities = 64/259 (24%) , Positives = 113/259 (42%) , Gaps = 48/259 (18%) 

Query: 16 GTMFGISQT VLAQETHQLTI VHLEARDIDRPNP QLEIAPKE-GTPIEGVLYQL 67 

G + GI+ T + H++T+ + E DI+R + QL+ +E G ++G L+ L 

Sbjct: 1147 GYLVGI1TOTQROTIDTVI.HEVTVEN-EKSDINRVSAVGAVQLQKVDEETGESLQGALFAL 1205 

Query: 68 YQLKSTEDGDLLAHWNSLTITELKKQAQQVFEATTNQQGKATFNQLPDGIYYGL AV 123 

Q E +TI E++ ++A + + G F+L +YL V 

Sbjct: 1206 QQKVDDE FVT1AEMETDEEGIVFAGSLEPGDYQFVELNAPVGYKLDETPW 1256 

Query: 124 KAGEKNRNVSAFLVDLSEDKVIYPKIIWSTGELDLLKVGVDGDTiCKPLAGWFELYEKNG 183 

E++R + ++L ++ + P G + L+KV DD LGFL+G 

Sbjct: 1257 FTVEEDRTET IELQKENHLIP GSVQLVKVDAD-DAANTLEGAEFTLLDGEG 1306 

Query: 184 RTPIRVKNGVHSQDIDAAKHLETDSSGHIRISGLIHGDYVLKEIETQSGYQIGQAETAVT 243 

V+ G L TD +G + ++ L G+Y E + +GY++ T 

Sbjct: 1307 NV VQEG LTTDENGQVWTDLKPGEYQFVETKAPAGYELEATPIGFT 1352 

Query: 244 IEKS- - KTVTVTIENKKVP 260 

IE++ + TV +EN +P 
Sbjct: 1353 IERNQQEVATVAVENHLIP 1371 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 3920 (GBS52) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 7 (lane 4; MW 30.5kDa). 

GBS52-His was purified as shown in Figure 192, lane 8. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1274 

A DNA sequence (GBSxl351) was identified in S.agalactiae <SEQ ID 3921> which encodes the amino 
acid sequence <SEQ ID 3922>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -6.26 Transmembrane 554 - 570 ( 551 - 575) 
INTEGRAL Likelihood = -0.16 Transmembrane 34 - 50 ( 34 - 50) 

Final Results 

bacterial membrane Certainty=0. 3506 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8779> which encodes amino acid sequence <SEQ ID 8780> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 0 
McG: Discrim Score: -5.81 
GvH: Signal Score (-7.5): -1.92 

Possible site: 37 
>» Seems to have a cleavable N-terminal signal sequence 
ALOM program count: 2 value: -6.26 threshold: 0.0 
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INTEGRAL Likelihood = -6.26 Transmembrane 527 - 543 (524 - 548) 
PERIPHERAL Likelihood = 5.36 194 
modified ALOM score: 1.75 

5 *** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 3506 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

LPXTG motif: 521-525 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:CAA57459 GB:X81869 orf2 [Lactobacillus leichmannii] 

Identities = 140/505 (27%) , Positives = 220/505 (42%) , Gaps = 94/505 (18%) 

Query: 102 GEVISNYAKLGDNVKGLQGVQFKRYKVKTDI SVDELKKLTTVEAADAKVGTILEE 156 

GE+++++ G LGVFKYV SD + T +DAK L 

20 Sbjct: 58 GEIMNDFGGTG LNGVTFKAYNVTDHYLSLRKSGDSAQDAVTAIQSDAKDSDNLPS 112 

Query: 157 - -GVSLPQKTNAQGLWDAL DSKSNVR-YLYVEDLKNSPSNITKAYAVPFV 204 

G ++ +T A D + DS N + YL+VE +SP+++T+ A P V 

Sbjct: 113 YAGSAIATETTATSKGEDGIAAFDNLNLKDSDGNYQTYLFVET- -DSPTDVTQQ-AAPIV 169 

25 

Query: 205 LELPVANSTGTGFLS - E INI YPKNWTDEPKTDKDVKKLGQDDAGYT I G 252 

L +P+ ++ T ++ +1 IYPKNV + P T KD+ + + D T+ G 
Sbjct: 170 LTMPIYKTSDTSAINHDIQIYPKNVKST-PIT-KDLDEASKKDLAVTLPDGSTIYNAQYG 227 

30 Query: 253 EEFKWFLKSTIPANLGDYEKFEITDKFADGLTYKSVGKIKIGSKTIiNRDEHYTIDEPTVD 312 
+ F + + +PN+D + F + DK G+ + + L+ YT+++ 
Sbjct: 228 KSFGYNITVNVPWNIKDKDTFNWDKPDTGI DIDASTVSIDGLTKSTDYTVNK 280 

Query: 313 NQNTLKITFKPEKFKEIAELLKGMTLvTQNQnALDKATANTDDAAFLEIPVASTINEKAVL 372 
35 N ++ FK + L G +L I +T+ A 

Sbjct: 281 KDNGYQWFKTTS- -AAVQALAGKSLT ITYKATLTNNATP 318 

Query: 373 GKAIENTFELQYDHTPDKADNPKPSNPPRKPEVHTGGKRFVKKDSTETQTLGGAEFDLLA 432 
KAI NT L + + S P P ++TGG +FVKKDS +TL GAEF L+ 

40 Sbjct: 319 DKAIGNTATLSIGNGTNIT STPANGPRIYTGGAQFVKKDSQSNKTLAGAEFQLVK 373 

Query: 433 --SDGTAVKWTDALIK7ANTNKNYIAGEAVTGQPIKLKSHTDGTFEIKGLAYAVDANAEGT 490 

S+G V+ +NAEAT S+G +KGL+Y ++ + 
Sbjct- 374 vDSNGNIVSYATQASDGSYTWNDSATEATT YTSDANGLVALKGLSY SDKLDS 425 

45 

Query: 491 AVTYKLKETKAPEGYVIPDKEIEFWSQTSYNTKPTDITVDSADATPDTIKNNKRPSIPN 550 

+Y L E +AP+GY D ++F+++Q S+ D+ TI N K +P+ . 

Sbjct: 426 GESYALLEIQAPDGYAKLDSPVKFSITQGSF GDSNKITIDNTKEGLLPS 474 

50 Query: 551 TGGIGTAIFVAIGAAVMAFAVKGMK 575 

TGG G IF+AIG +M A G K 
Sbjct: 475 TGGKGIYIFLAIGIVIMIVAFGGYK 499 

No corresponding DNA sequence was identified in S.pyogenes. 

55 SEQ ID 8780 (GBS80) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 16 (lane 6; MW 56.8kDa). 

The GBS80-His fusion product was purified (Figure 104A; see also Figure 194, lane 5) and used to 
immunise mice (lane 1+2 product; 20ug/mouse). The resulting antiserum was used for Western blot (Figure 
104B), FACS (Figure 104C ), and in the in vivo passive protection assay (Table III). These tests confirm 
60 that the protein is immunoaccessible on GBS and that it is an effective protective immunogen. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1275 

A DNA sequence (GBSxl352) was identified in S.agalactiae <SEQ ID 3923> which encodes the amino 
5 acid sequence <SEQ ID 3924>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

»> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 4043 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1276 

A DNA sequence (GBSxl353) was identified in S.agalactiae <SEQ ID 3925> which encodes the amino 
20 acid sequence <SEQ ID 3926>. This protein is predicted to be MsmR. Analysis of this protein sequence 
reveals the following: 



25 



30 



Possible site: 32 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.01 Transmembrane 75 - 91 ( 75 - 92) 



Final Results 

bacterial membrane Certainty=0 . 1404 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9679> which encodes amino acid sequence <SEQ ID 9680> 
was also identified. 

SEQ ID 3926 (GBS360) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 81 (lane 9; MW 74kDa). 

35 GBS360-GST was purified as shown in Figure 216, lane 8. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1277 

A DNA sequence (GBSxl354) was identified in S.agalactiae <SEQ ID 3927> which encodes the amino 
40 acid sequence <SEQ ID 3928>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>>> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0. 1762 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3929> which encodes the amino acid 
5 sequence <SEQ ID 3930>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

>>> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 . 1640 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

15 Identities = 93/98 (94%) , Positives = 96/98 (97%) 

Query: 1 MDKIIKSISASGAFRSYVLDSTETVKLAQEKHHTLSSSTVALGRTLIANQILAANQKGDS 60 

MDKIIKSI+ SGAFR+YVLDSTETV LAQEKH+TLSSSTVALGRTLIANQILAANQKGDS 
Sbjct: 1 MDKIIKSIAQSGAFRAYVLDSTETVALAQEKHNTLSSSTVALGRTLIANQILAANQKGDS 60 

20 

Query: 61 KITVKVIGDSSFGHIISVADTKGHVKGYIQNTGVDIKK 98 

KITVKVIGDSSFGHIISVADTKGHVKGYIQNTGVDIKK 
Sbjct: 61 KITVKVIGDSSFGHIISVADTKGHVKGYIQNTGVDIKK 98 

25 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1278 

A DNA sequence (GBSxl355) was identified in S.agalactiae <SEQ ID 393 1> which encodes the amino 
acid sequence <SEQ ID 3932>. Analysis of this protein sequence reveals the following: 

30 Possible site: 17 

>» Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC98436 GB:L29324 unknown [Streptococcus pneumoniae] 
40 Identities = 34/48 (70%) , Positives = 39/48 (80%) 

Query: 1 MQEVLIIARENHQVTHEHVSILLTCVQELIVEVNQTQPLSREFREKYM 48 

+ EV IIA+ NHQ VTHEHVS ILLTC+QELI EV +T PLS +F KYM 
Sbjct: 70 vHEVFIIAKTNHQVTHEHVSILLTCIQELIKEVEKTGPLSEDFCNKYM 117 

45 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1279 

50 A DNA sequence (GBSxl356) was identified in S.agalactiae <SEQ ID 3933> which encodes the amino 
acid sequence <SEQ ID 3934>. This protein is predicted to be TnpA (orfB). Analysis of this protein 
sequence reveals the following: 
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Possible site: 13 

>» Seems to have no N-terminal signal sequence 



Final Results 

5 bacterial cytoplasm Certainty=0. 524 8 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9907> which encodes amino acid sequence <SEQ ID 9908> 
10 was also identified. A further related GBS nucleic acid sequence <SEQ ID 9677> which encodes amino acid 
sequence <SEQ ID 9678> was also identified. A further related GBS nucleic acid sequence <SEQ ID 
1091 1> which encodes amino acid sequence <SEQ ID 10912> was also identified. 

There is homology to SEQ ID 1336. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
15 vaccines or diagnostics. 

Example 1280 

A DNA sequence (GBSxl357) was identified in S.agalactiae <SEQ ID 3935> which encodes the amino 
acid sequence <SEQ ID 3936>. Analysis of this protein sequence reveals the following: 

Possible site: 45 
20 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 448 9 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

25 bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



30 



>GP:AAB64982 GB:U43834 Ydr540cp [Saccharomyces cerevisiae] 
Identities = 93/171 (54%) , Positives = 121/171 (70%) , Gaps = 3/171 (1%) 

Query: 1 MRVYENKEELKKEISKTFEKyi^ETOIPENLKDKRIDEvDRTPAANLSYQVGWTNLVLK 60 

MR Y +K+ELK+EI K +EKY EF I E+ KD++++ VDRTP+ NLSYQ+GW NL+L+ 
Sbjct: 1 MREYTSKKELKEEIEKKYEKYDAEFETISESQKDEKVETVDRTPSENLSYQLGWVNLLLE 60 

35 Query: 61 WEEDERKGLQVKTPSDKFKWNQLGELYQWFTDTYAHLSLQELKAKLNENINSIYAMIDLL 120 

WE E G V+TP+ +KWN LG LYQ F Y S++E +AKL E +N +Y I L 
Sbjct: 61 WEAKEI AGYNVETPAPGYKWNNLGGLYQSFYKKYG I YS I KEQRAKLREAVNEVYKWI STL 120 

Query: 121 SEEELFEAHMRKWADEATKTATWEVYKFIHVNTVAPFGTFRTKIRKWKKIV 171 
40 S++ELF+A RKW AT A W VYK+IH+NTVAPF FR KIRKWK++V 

Sbjct: 121 SDDELFQAGNRKW ATTKAMWPVYKWIHINTVAPFTNFRGKIRKWKRLV 168 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
45 vaccines or diagnostics. 

Example 1281 

A DNA sequence (GBSxl358) was identified in S.agalactiae <SEQ ID 3937> which encodes the amino 
acid sequence <SEQ ID 3938>. Analysis of this protein sequence reveals the following: 

Possible site: 28 
50 »> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -3.45 Transmembrane 10 - 26 ( 2 - 26) 
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Final Results 

bacterial membrane — Certainty=0. 2381 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8781> which encodes amino acid sequence <SEQ ID 8782> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 
McG: Discrim Score: 8.80 
GvH: Signal Score (-7.5): -3.94 

Possible site: 28 
»> Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -3.45 threshold: 0.0 

INTEGRAL Likelihood = -3.45 Transmembrane 7 - 23 ( 2 - 26) 
PERIPHERAL Likelihood = 10.40 69 
modified ALOM score: 1.19 



*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0. 2381 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA68889 GB:Y07615 acid phosphatase [Haemophilus influenzae] 

' Identities = 112/245 (45%) , Positives = 148/245 (59%) , Gaps = 10/245 (4%) 

Query: 5 MKKVLVS S LLVLG I T I TLQTVVERKGPKVAYTQEGMTALSDTNKDKVTTI SIDEIQKSLE 64 

MK V+ S++ L +T V G YTQ G A + + IS+D+I++SLE 

Sbjct: 1 MKNVMKLSVIAL LTARAVPAMAGKTEPYTQSGTNAREMLQEQAIHWISVDQIKQSLE 57 

Query: 65 GKKPITVSFDIDDTLLFSSQYFQYGKEYVTPGSFDFLHKQKFWDLVAKRGDQDSIPKEYA 124 

GK PI VSFDIDDT+LFSS F +G++ +PG D+L Q FW+ V D+ SIPK+ A 
Sbjct: 58 GKAPINVSFDIDDTVLFSSPCFYHGQQKFSPGKHDYLKNQDFWNEVNAGCDKYSIPKQIA 117 

Query: 125 KKLIAMHQKRGDKIVFITGRTRGSMYKEGEVDKTAKALAKDFKLDKPIAVNYTGDKPKKP 184 

LI MHQ RGD++ F TGRT G+VD LKF+ V + G + ++ 

Sbjct: 118 IDLINMHQARGDQVYFFTGRT AGKVDGVTPILEKTFNIKNMHPVEFMGSR-ERT 170 

Query: 185 YKYDKSYYIKKYGSDIHYGDSDDDIHAAREAGARPIRILRAPNSTNLPLPEAGGYGEEVL 244 

KY+K+ I + IHYGDSDDD+ AA+EAG R IR++RA NST P+P GGYGEEVL 
Sbjct: 171 TKYNKTPAIISHKVSIHYGDSDDDVLAAKEAGVRGIRLMRAANSTYQPMPTLGGYGEEVL 230 

Query: 245 ENSAY 249 
NS+Y 

Sbjct: 231 INSSY 235 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3939> which encodes the amino acid 
sequence <SEQ ID 3940>. Analysis of this protein sequence reveals the following: 

Possible site: 56 
>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -3.98 Transmembrane 6 - 22 ( 4 - 25) 



Final Results 

bacterial membrane Certainty=0 .2593 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:CAA68889 GB:Y07615 acid phosphatase [Haemophilus influenzae] 
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Identities = 105/237 (44%) , Positives = 141/237 (59%) , Gaps = 10/237 (4%) 

Query: 9 LFTVSFCGIIALPVEASGPKVPYTQEGITA--ISNQATVKLISIADIASSLEGQKPITVS 66 

L ++ A+P A G PYTQ G A + + + IS+ I SLEG+ PI VS 

Sbjct: 7 LSVIALLTAAA.VPAMA-GKTEPYTQSGTNAREMLQEQAIHWISVDQIKQSLEGKAPINVS 65 

Query: 67 FDIDDTLLFTSQYFQYGKEYITPGSFDFLHRQKFWDLVAKRGDQDSIPKEYAKQLIAMHQ 126 

FDIDDT+LF+S F +G++ +PG D+L Q FW+ V D+ SIPK+ A LI MHQ 
Sbjct: 66 FD IDDTVLFS S PCFYHGQQKFS PGKHDYLKNQDFWNEVNAGCDKYS I PKQIAI DLINMHQ 125 

Query: 127 KRGDKIVFITGRTRGSmKKGEIDKTAKSLAKDFKLDKPIAINYTGDKAVKPYQYDKTYY 186 

RGD++ F TGRT G++D LKF+ + + G+ + +Y+KT 

Sbjct: 126 ARGDQVYFFTGRT AGKVDGVTPI LEKTFNI KNMHP VEFMGSRE - RTTKYNKTPA 178 

15 Query: 187 IKKNGSQIHYGDSDEDINAAKEAGARPIRILRAPNSTNLPLPKAGGYGEEVLENSAY 243 

I + IHYGDSD+D+ AAKEAG R IR++RA NST P+P GGYGEEVL NS+Y 
Sbjct: 179 IISHKVSIHYGDSDDDvLA&KEAGWGIRLMRAANSTYQPMPTLGGYGEEvLINSSY 235 

An alignment of the GAS and GBS proteins is shown below. 

20 Identities = 196/245 (80%) , Positives = 216/245 (88%) , Gaps = 2/245 (0%) 

Query: 5 MKK^VSSLLVLGITITLQTVVEAKGPKVAYTQEGMTALSDTNKDKVTTISIDEIQKSLE 64 

MKK S L + + VFA GPKV YTQEG+TA+S N+ V ISI +1 SLE 

Sbjct: 1 MKKEFTSILFTVSFCGIIALPVEASGPKVPYTQEGITAIS--NQATVKLISIADIASSLE 58 

25 

Query: 65 GKKPITVSFDIDDTLLFSSQYFQYGKEYVTPGSFDFLHKQKFWDLVAKRGDQDSIPKEYA 124 

G+KPITVSFDIDDTLLF+SQYFQYGKEY+TPGSFDFLHKQKFWDLVAKRGDQDSIPKEYA 
Sbjct: 59 GQKPITVSFDIDDTLLFTSQYFQYGKEYITPGSFDFLHKQKFWDLVAKRGDQDSIPKEYA 118 

30 Query: 125 KKLIAMHQKRGDKIVFITGRTRGSMYKEGEVDKTAKALAKDFKLDKPIAVNYTGDKPKKP 184 

K+LIAMHQKRGDKIVFITGRTRGSmK+GE+DKTAK+LAKDFKLDKPIA+NYTGDK KP 
Sbjct: 119 KQLIAMHQKRGDKIVFITGRTRGS^KKGEIDKTAKSLAKDFKLDKPIAINYTGDKAVKP 178 

Query: 185 YKYDKSYYIKKYGSDIHYGDSDDDIHAAREAGARPIRILRAPNSTNLPLPEAGGYGEEVL 244 
35 Y+YDK+YYIKK GS IHYGDSD+DI+AA+EftGARPIRILRAPNSTNLPLP+AGGYGEEVL 

Sbjct: 179 YQYDKTYYIKKNGSQIHYGDSDEDINAAKEAGARPIRILRAPNSTNLPLPKAGGYGEEVL 238 

Query: 245 ENSAY 249 
ENSAY 

40 Sbjct: 239 ENSAY 243 

SEQ ID 8782 (GBS 100) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 16 (lane 5; MW 28kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 33 (lane 2; MW 53kDa). 

45 The GBS100-GST fusion product was purified (Figure 106A; see also Figure 197, lane 4) and used to 
immunise mice (lane 1 product; 9.9ug/mouse). The resulting antiserum was used for Western blot (Figure 
106B), FACS (Figure 106C ), and in the in vivo passive protection assay (Table III). These tests confirm 
that the protein is immunoaccessible on GBS and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
50 vaccines or diagnostics. 

Example 1282 

A DNA sequence (GBSxl359) was identified in S.agalactiae <SEQ ID 3941> which encodes the amino 
acid sequence <SEQ ID 3942>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
55 »> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 3288 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

10 Example 1283 

A DNA sequence (GBSxl360) was identified in S.agalactiae <SEQ ID 3943> which encodes the amino 
acid sequence <SEQ ID 3944>. Analysis of this protein sequence reveals the following: 



15 



20 



30 



Possible site: 50 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .4004 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9675> which encodes amino acid sequence <SEQ ID 9676> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04406 GB:AP001509 RNA methyltransf erase [Bacillus halodurans] 
25 Identities = 198/452 (43%) , Positives = 300/452 (65%) 

Query: 12 KRKIMLHKNDI IETEISDISHEGMGIAKVDGFVFFVENALPGEI IKMRVLKLRKRIGYGK 71 

K++ ++KND++E I D++H+G G+AKVDG+ F+ ALPGE +K +V+K++K G+G+ 
Sbjct: 3 KQQAPVNKNDVVEvTIEDLTHDGAGVAKVDGYADFIPKALPGERLKAKvVKv'KKGYGFGR 62 

Query: 72 VEEYLTTSPHRNEGLDYTYLRTGIADLGHLTYEQQLLFKQKQVADNLYKIAHISDVLVEP 131 

V +SPRE ++G L H++Y+ QL +KQKQV D L +1 1+ V V P 

Sbjct: 63 VLNMIEASPDRVEAPCPVFNQCGGCQLQHMSYDAQLRYKQKQVQDVLERIGKITAVTVRP 122 

35 Query: 132 TLGMTIPLAYRNKAQVPVRRVDGQLETGFFRKNSHTLVSIEDYLIQEKEIDALINFTRDL 191 

T+GM P YRNKAQVPV +G L GF+++ SH ++ +++ +IQ +E D +1 ++L 
Sbjct: 123 TIGMNEPWRYRNKAQVPVGEREGGLIAGFYQERSHRIIDMDECMIQHEENDKVIRQVKEL 182 

Query: 192 LRKFDVKPYDEEQQSGLIRNLWRRGHYTGQLMLVLVTTRPKIFRIDQMIEKLVSAFPSV 251 
40 R+ ++ YDEE+ G +R++V R G TG++M+VL+T ++ +IE++ A P V 

Sbjct: 183 ARELGIRGYDEEKHRGTLRHVVARYGKNTGEIMVVLITRGEEIjPHKKTLIERIHKAIPHV 242 

Query: 252 VSIMQNINDRNSNVIFGKEFRTLYGSDTIEDQMLGNTYAISAQSFYQVNTEMAEKLYQKA 311 
SI+QN+N + +NVIFG + + L+G + I D + +AISA+SFYQVN E + LY +A 
45 Sbjct: 243 KSIVQNVNPKRTNVIFGDKTKVLWGEEYIYDTIGDIKFAISARSFYQVNPEQTKVLYDQA 302 

Query: 312 IDFSDLNSEDIVIDAYSGIGTIGLSVAKQvKHVYGVEvVEKAVSDAKENATRNGITNSTY 371 

++F++L + VIDAY GIGTI L +A+Q KHVYGVE+V +A+SDAK NA NG N + 
Sbjct: 303 LEFANLTGSEWIDAYCGIGTISLFIACflAKHvYGVEIVPEAISDAKRNARLNGFANVQF 362 

50 

Query: 372 VADSAENAMAKOTiKEGIKPWIMVDPPRKGLTESFVYSAAQTKADKITYISCNSATMARD 431 

AE M W +G++ VI+VDPPRKG E+ + + K D++ Y+SCN AT+ARD 
Sbjct: 363 AVGDAEKVMPWWYAQGVRADVIWDPPRKGCDEALL^ 422 

55 Query: 432 IKLFEELGYHLVKIQPVDLFPMTHHVECVALL 463 

+++ E+ GY +QPVD+FP T H+E VA+L 
Sbjct: 423 LRVLEDGGYETKDVQPVDMFPWTTHIESVAVL 454 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 3945> which encodes the amino acid 
sequence <SEQ ID 3946>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1262 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 332/454 (73%) , Positives = 387/454 (85%) 



Query: 


12 


KRKIMLHKiroilETEISDISHEGMGIAKOTGFVFFvENALPGEIIKMRVLKLRKRIGYGK 


71 






KK MJj KNDXX+ iSD+bHEG Qr+AK DGFVFFV+NAIjP E+l MRVJjK+ K G+bK 




Sbjct: 


8 


KRIRMLKKI^IIQVAISDLSHEGAGVAKHDGFVFFVDNALPEEV 


67 


Query: 


72 


VEEYLTTSPHRNEGLDYTYLRTGIADLGHLTYEQQLLFKQKQVADNLYKIAHISDVLVEP 


131 






VHi I b KJM ++ 1 xljKICalADijCarlijl in Qh fcK+KQV U+IjlKJ-A lt>UV Vh* 




Sb j ct : 


68 


VEAYHYLSSARNADVNLTYLRTGIADLGHLTYEDQLTFKKKQVQDSLYKIAGISDVTVES 


127 


Query: 


132 


TLGMTIPLAYRNKaQVPVRRVDGQLETGFFRKNSHTLVSIEDYLIQEKEIDALINFTRDL 


191 






T+GMT PLAYRNKAQVPVRRV+GQLETGFFRK+SH L+ I DY IQ+KEID LINFTRDL 




Sb j ct : 


128 


TIGMTEPLAYRNKAQVPVRRVNGQLETGFFRKHSHDLIPISDYYIQDKEIDRLINFTRDL 


187 


Query: 


192 


LRKFDVKPYDEEQQSGLIRNLWRRGHYTGQLMLVLVTTRPKIFRIDQMIEKLVSAFPSV 


251 






LR+FD+KPYDE +Q+GL+RN+WRRGHY+G++MLVLVTTRPK+FR+DQ+IEK+V AFP+V 




Sbjct: 


188 


LRRFDIKPYDETEQTGLLRNIVWRGHYSGEMMLVLVTTRPKVFRVDQVIEKIVEAFPAV 


247 


Query: 


252 


VSIMQNINDRNSNVIFGKEFRTLYGSDTIEDQMLGNTYAISAQSFYQVNTEMAEKLYQKA 


311 






VSI+QNIND+N+N IFGK+F+TLYG DTI D MLGN YAISAQSFYQVNT MAEKLYQ A 




Sbjct: 


248 


VSIIQNINDKNTNAIFGKDFKTLYGKDTITDSMLGNNYAISAQSFYQVN^ 


307 


Query: 


312 


IDFSDLNSEDIVIDAYSGIGTIGLSVAKQVKHVYGVEWEKAVSDAKENATRNGITNSTY 


371 






I FSDL+ +DIVIDAYSGIGTIGLS AK VK VYGVEV+E AV DA++NA NGITN+ + 




Sbjct: 


308 


IAFSDLSKDDIVIDAYSGIGTIGLSFAKTVKAVYGVEVIEAAVRDAQQNAALNGITNAYF 


367 


Query: 


372 


VADSAENAMAKWLKEGIKPWIMVDPPRKGLTESFVYSAAQTKADKITYISC1NSATMARD 


431 






VAD+AE+AMA W K+GIKP+VI+VDPPRKGLTESF+ +4- KITY+SCN ATMARD 




Sbjct: 


368 


VADTAEHAMATWAKDGIKPSVILVDPPRKGLTESFIQASVAMGPQKITYVSCNPATMARD 


427 


Query: 


432 


IKLFEELGYHLVKIQPVDLFPMTHHVECVALLVK 465 








IK ++ELGY L K+QPVDLFP THHVECV LL+K 




Sbjct: 


428 


IKRYQELGYKLAKVQPVDLFPQTHHVECWLLIK 461 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1284 

A DNA sequence (GBSxl361) was identified in S.agalactiae <SEQ ID 3947> which encodes the amino 
acid sequence <SEQ ID 3948>. This protein is predicted to be PSR protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 58 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-12.15 Transmembrane 135 - 151 ( 127 - 155) 

Final Results 

bacterial membrane Certainty=0. 5861 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

• >GP:CAB76822 GB:AJ276232 PSR protein [Enterococcus faecalis] 

Identities = 143/409 (34%) , Positives = 206/409 (49%) , Gaps = 56/409 (13%) 

5 Query: 48 QRRTES PP - - TNSYYEEPYSDSYYQDDDFYSEPQLTSQGLP I YQEERAPKKKKQRARKEK 105 

+ R E P S E Y DSY +D T G ++ P+ KK + K+K 

Sbjct: 31 EHREEEPEELAESLQEPVYEDSYTEDSRRSERRHQTDSGGG-NGSDQPPRGKKDKKPKKK 89 

Query: 106 QRVK^MAPFPPKAITPPRKKKKFKGPLKFIGIILLIVLSGMVFMFVKGMRDVNNGKSHYS 165 
10 RKK K K F K++ I+L+++ + MF+KG + S 

Sbjct: 90 RKKSKTKRFFKWL VILLILLFAYSTVMFLKGKSAAEHDDS - LP 131 

Query: 166 PAIIEDFKGKDAVDGT-NILILGSDKRVSERSTDARTDTIWANVGNKDNKVKMVSFMRD 224 
+E F G + +G NILILGSD R + R DTIMV + K K++SFMRD 

15 Sbjct: 132 QEKVETFNGVKSSNGAKNILILGSDTRGEDAG RADTIMVLQLNGPSKKPKLISFMRD 188 

Query: 225 LLWIPOTSTEGYYDMKimSFNLGEQDJSmKGAEYTOQTLKNHFDIDIKYY™ 284 

V+IP G K+NA++ G GAE VR+TLK +F++D KYY VDF++F 

Sbjct: 189 TFVDIP GVGPNKINAAYAYG GAELVRETLKQNFNLDTKYYAKVDFQSFE 237 

20 

Query: 285 DAIDTLFPNGVK1NAKFGLVGGQSADSVKVPDDLRMKNGWPSQKIKVGIQYMDGRTLLN 344 

+D++FP GVKI+A+ L + D V 1+ G Q MDG LL 

Sbjct: 238 KIVDSMFPKGVKIDAEKSL NLDGVD IEKGQQVMDGHVLLQ 277 

25 Query: 345 YARFRKDDDGDFGRTQRQQQVMRAIVSQIKDPRRLFTGSAAIGK&YALTSSNLSYSFVLT 404 

YARFR D++GDFGR +RQQQVM A++SQ+K+P L ++GK S+++ SF+LT 

Sbjct: 278 YARFRMDEEGDFGRVRRQQQVMSAVMSQMKNPMTLLRTPESLGKLVGYMSTDVPVSFMLT 337 

Query: 405 DGIPILSDAKNGIKQMTIPREGDWVDDYDQYGGQGLTIDFAKYKKILKK 453 
30 +G +L K G++ +++P W Y G L +D K ++K 

Sbjct: 338 NGPSLLIKGKTGVESLSVPVPDSWNFGESSYAGSILEVDEQKNftDAIEK 386 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3949> which encodes the amino acid 
sequence <SEQ ID 3950>. Analysis of this protein sequence reveals the following: 

35 Possible site: 49 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -7.96 Transmembrane 159 - 175 ( 152 - 180) 

40 Final Results 

bacterial membrane Certainty=0 .4185 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the databases: 

>GP:CAB76822 GB:AJ276232 PSR protein [Enterococcus faecalis] 
Identities = 140/345 (40%) , Positives = 195/345 (55%) , Gaps = 41/345 (11%) 

Query: 140 PRSQK RKHKKKGCMKWFFNILGLLLMTVLMGLGLMFAKGVFDISTNKANYKPAVSQ 195 

50 PR +K +K +KK K FF L +LL+ + +MF KG + + + V + 

Sbjct: 78 PRGKKDKKPKKKKKKSKTKRFFKWLVILLILLFAYSTVMFLKGKSAAEHDDSLPQEKV-E 136 

Query: 196 AFDGQETQDGT-NILILGSDQRVTQGSTDARTDTIMVVNVGNHAKKIKMVSFMRDTLINI 254 
F+G ++ +G NILILGSD T+G R DTIMV+ + +KK K++SFMRDT ++I 
55 Sbjct: 137 TFNGVKS SNGAKNI LILGSD TRGEDAGRADTIMVLQLNGPSKKPKLI S FMRDTFVDI 193 

Query: 255 PGYSYNDNSYDLKLNSAFNLGEQEDHHGAEYvIa^KHNFDIDIKYYV^WDFETFAEAID 314 

PG N K+N+A+ G GAE VR LK NF++D KYY VDF++F + +D 

Sbjct: 194 PGVGPN KINAAYAYG GAEL VRETLKQNFNLDTKYYAKVDFQS FEKI VD 241 



60 



Query: 315 TLFPNGVKIDAKFATVGGVAVDSVEVPDDLRMKNGWPNQTIEVGEQRMDGRTLLNYARF 374 

++FP GVKIDA+ + + +D V+ IE G+Q MDG LL YARF 

Sbjct: 242 SMFPKGVKIDAEKS LNLDGVD IEKGQQVMDGHVLLQYARF 281 
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Query: 375 RKDDEGDFGRTTOQQQVMSAVMSQIKDPTKLFTGS2UVIGKIYALTSTNVSFPFWKNGVS 434 

R D+EGDFGR RQQQVMSAVMSQ+K+P L ++GK+ ST+V F++ NG S 

Sbjct: 282 RMDEEGDFGRVRRQQQVMSAVMSQMKNPMTLLRTPESLGKLVGYMSTDVPVSFMLTNGPS 341 

Query: 435 VLGSGKNGvEHVTI PENGDWVDEYDMVGGQALYIDFDKYQKTLAK 479 

+L GK GVE +++P W YGL+DK +K 

Sbjct: 342 LLIKGKTGVESLSVPVPDSWNFGESSYAGSILEVDEQKNADAIEK 386 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 273/486 (56%) , Positives = 340/486 (69%) , Gaps = 32/486 (6%) 



Query: 


1 


MSRJ^GQLNHHEELRYOTLLKNIHYLNEREKMEFQYLHYKKTAVRPQRRTESPPTNSYY 


60 






M++ G L+HHEELRY YLL+N+ YL+E EK EF +L K R ++ S 




Sb j ct : 


1 


MTKYPMGGLSHHEELRYFYLLRNLSYLSENEKKEFAFLKSKLEIGRAYAPSKQHYRKSKR 


60 


Query: 


61 


EEPY-SDSYY QDDDFYSEPQLTSQGLPIYQEERAPKKKKQRARKEKQRVKV 


110 






+EPY D YY +DDD + GLPIY +E KK K R + 




Sb j ct : 


61 


QEPYFEDDYYNDYSPNDLLEDDDVNHDSSFVPYGLPIYPKEDRYIiNKKT KLTARRPI 


117 


Query: 


111 


MAPFP PKAITPPRKKKK-FKGFLKFIGI ILLI VLSGMVFMFVK 


152 






AP P P++ KKK K F +G++L+ VL G+ MF K 




Sb j Ct : 


118 


DAPQPIDEDDAFLTESVARCALPRSQKRKHKKKGCMKWFFNILGLLLMTVLMGLGLMFAK 


177 


Query: 


153 


G^DVIJNGKSHYSPAIIEDFKGKDAVDGTNILILGSDKRVSERSTDARTDTIMVANVGNK 


212 






G+ D++ K++Y PA+ + F G++ DGTNILILGSD+RV++ STDARTDTIMV NVGN 




Sbj ct : 


178 


GVFDISTNKANYKPAVSQAFDGQETQDGTNILILGSDQRVTQGSTDARTDTIMWNVGNH 


237 


Query : 


213 


DNKVKMVSFMRDLLVNIPOTS-TEGYYDMKI^ASFNLGEQDNHKGAEYVRQTLKNHFDID 


271 






K+KMVSFMRD L+NIP YS + YD+KLN++FNLGEQ++H GAEYVR+ LK++FDID 




Sbjct: 


238 


AKKIKMVSFMRDTLINIPGYSYNDNSYDLKLNSAFNLGEQEDHHGAEYVRRALKHNFDID 


297 


Query: 


272 


IKYYvMVDFETFADAIDTLFP^VKINRKFGLVGGQSADSVKVPDDLRMKNGVVPSQKIK 


331 






IKYYVMVDFETFA+AIDTLFPNGVKI+AKF VGG + DSV+VPDDLRMKNGWP+Q 1+ 




Sbjct: 


298 


IKYYvMvDFETFAEAIDTLFPNGvTaDAKFATOGGVAVDSvEVPDDLRMKNGWPNQTIE 


357 


Query: 


332 


VGIQYMDGRTLLNYARFRKDDDGDFGRTQRQQQVMRAIVSQIKDPRRLFTGSAAIGKAYA 


391 






VG Q MDGRTLLNYARFRKDD+GDFGRT RQQQVM A++SQIKDP +LFTGSAAIGK YA 




Sbjct: 


358 


VGEQRMDGRTLLNYARFRKDDEGDFGRTWQQQVMSAVMSQIKDPTKLFTGSAAIGKIYA 


417 


Query: 


392 


LTSSNLSYSFVLTDGIPILSDAKNGIKQMTIPREGDWVDDYDQYGGQGLTIDFAKYKKIL 


451 






LTS+N+S+ FV+ +G+ +L KNG++ +TIP GDWVD+YD YGGQ L IDF KY+K L 




Sbj ct : 


418 


LTSTIWSFPFVVKNGVSVIjGSGKNGVEHVTIPENGDWVDEYDMYGGQALYIDFDKYQKTL 


477 


Query: 


452 


KKMGLR 457 








K+GLR 




Sbj ct .- 


478 


AKLGLR 483 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1285 

A DNA sequence (GBSxl362) was identified in S.agalactiae <SEQ ID 395 1> which encodes the amino 
acid sequence <SEQ ID 3952>. This protein is predicted to be shikimate kinase (aroK). Analysis of this 
protein sequence reveals the following: 

Possible site: 17 

>>> Seems to have a cleavable N- terra signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA55181 GB:X78413 shikimate kinase [Lactococcus lactis] 
Identities = 65/164 (39%) , Positives = 98/164 (59%) , Gaps = 8/164 (4%) 

5 Query: 1 MPKVLLGFMGVGKTSVANCLENEVIDMDSLIEKHIGMSISRFFTEEGEASFRALESQFLN 60 

M +L+GFMG GK++VA L E D+D LIE+ I M 1+ FF GEA FR +E++ 
Sbjct: 1 MSIILIGFMGAGKSTVAKLLAEEFTDLDKLIEEEIEMPIATFFELFGEADFRKIEKEVFE 60 

Query: 61 ELLKKKNEGLVIASGGGI VLLEENRRLLTIjNRHIWIL-LTGSFEvLYHRIKKDEKNRRPL 119 
10 ++K ++IA+GGGI+ E + L L+R + ++ LT F+ L+ RI D +N RP 

Sbjct: 61 LAVQK DI I IATGGGI I - - ENPKNLNVLDRASRWFLTADFDTLWKRISMDWQNVRP- 114 

Query: 120 FLNHSKEEFYDIYQKRMLLYSGLSDMIIDTDYLTPQKIATVIGE 163 
L KE +++KRM YS ++D+ ID +P++IA I E 
15 Sbjct: 115 -LAQDKEAAQLLFEKRMKDYSLVADLTIDVTDKSPEQIAEQIRE 157 

A related DNA sequence was identified in S. pyogenes <SEQ ID 3953> which encodes the amino acid 
sequence <SEQ ID 3954>. Analysis of this protein sequence reveals the following: 

Possible site: 43 
20 >» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3 000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA55181 GB:X78413 shikimate kinase [Lactococcus lactis] 
Identities = 63/160 (39%) , Positives = 97/160 (60%) , Gaps = 5/160 (3%) 

Query: 1 MTKVLLGFMGVGKTTVSECHLSMHCKDMDAIIEAKIGMSIAAFPEQHGEIAFRTIESQVLK 60 

M+ +L+GFMG GK+TV+K L+ D+D +IE +1 M IA FFE GE FR IE++V + 
Sbjct: 1 MSIILIGFMGAGKSTVAKLLAEEFTDLDKLIEEEIEMPIATFFELFGEADFRKIEMEVFE 60 

35 Query: 61 DLLFANDNS 1 1 VTGGGVWLQENRQLLRKNHQHN I LL VAS FETLYQRLKHDKKSQRPLFL 120 

L +11 TGGG++ +N +L + + L A F+TL++R+ D ++ RP L 

Sbjct: 61 --IAVQKDIIIATGGGIIFJNPKNLNVLDR-ASRWFLTADFDTLWKRISMDWQNVRP--L 115 

Query: 121 KYSKEAFYEFYQQRMVFYEGLSDL VTR VDHRTPEEVANI I 160 
40 KEA +++RM Y ++DL I V ++PE++A I 

Sbjct: 116 AQDKEAAQLLFEKRMKDYSLVADLTIDVTDKSPEQIAEQI 155 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 88/161 (54%) , Positives = 120/161 (73%) , Gaps = 1/161 (0%) 

45 

Query: 1 MPKVLLGFMGVGKTSVANCLENEVIDMDSLIEKHIGMSISRFFTEEGEASFRALESQFLN 60 

M KVLLGFMGVGKT+V+ L DMD++IE IGMSI+ FF + GE +FR +ESQ L 

Sbjct: 1 MTKVLLGFMGVGKTTVSKHLSMHCKDMDAI IEAKIGMS IAAFFEQHGEIAFRTIESQ VLK 60 

50 Query: 61 ELLKKKNEGLVIASGGGI VTjLEENRRLLTLNRHNNILLTGSFEVLYHRIKKDEKNRRPLF 120 

+LL N+ +1 +GGG+V+L+ENR+LL N +NILL SFE LY R+K D+K++RPLF 
Sbjct: 61 DLLFA-NDNSIIVTGGGVVVLQENRQLLRKNHQHNILLVASFETLYQRLKHDKKSQRPLF 119 

Query: 121 LNHSKEEFYDIYQKRMLLYSGLSDMIIDTDYLTPQKIATVI 161 
55 L +SKE FY+ YQ+RM+ Y GLSD++I D+ TP+++A +1 

Sbjct: 120 LKYSKEAFYEFYQQRMVFYEGLSDLVIRVDHRTPEEVANII 160 



30 



60 



SEQ ID 3952 (GBS 152) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 25 (lane 2; MW 20kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 37 (lane 2; MW 45.5kDa). 
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Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1286 

A DNA sequence (GBSxl363) was identified in S.agalactiae <SEQ ID 3955> which encodes the amino 
acid sequence <SEQ ID 3956>. This protein is predicted to he 3-phosphoshikimate 1- 
carboxyvinyltransferase (aroA). Analysis of this protein sequence reveals the following: 

Possible site: 39 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.81 Transmembrane 241 - 257 ( 240 - 257) 
INTEGRAL Likelihood = -0.06 Transmembrane 390 - 406 ( 390 - 406) 



A related GBS nucleic acid sequence <SEQ ID 9673> which encodes amino acid sequence <SEQ ID 9674> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD45819 GB:AF169483 5 -enolpyruvylshikimate-3 -phosphate synthase 
[Streptococcus pneumoniae] 
Identities = 288/426 (67%) , Positives = 347/426 (80%) 



Query: 


5 


MKLLTNANTLKGTIRVPGDKSISHRAIIFGSISQGVTRIVDVLRGEDVLSTIEAFKQMGV 


64 






MKL TN L G IRVPGDKSISHR+IIFGS+++G T++ D+LRGEDVLST++ F+ +GV 




Sbjct: 


1 


MKLKTNIRHLHGIIRVPGDKSISHRSIIFGSLAEGETKvYDILRGEDVLSTMQVFRDLGV 


60 


Query: 


65 


LIEDDGEIITIYGKGFAGLTQPNNLLDMGNSGTSMRLIAGVLAGQEFEVTMVGDNSLSKR 


124 






IED +IT+ G G AGL P N L+MGNSGTS +RLI +GVLAG +FEV M GD+SLSKR 




Sb j Ct : 


61 


E IEDKDGVI TVQGVGMAGLKAPQNALNMGNSGTS I RLI SGVLAGADFEVEMFGDDSLSKR 


120 


Query: 


125 


PMDRIALPLSKMGARISGVTNRDLPPLKLQGTKKLKPIFYHLPVASAQVKSALIFAALQT 


184 






PMDR+ LPL KMG ISG T RDLPPL+L+GTK L+PI Y LP+ASAQVKSAL+ FAALQ 




Sb j ct : 


121 


PMDRVTLPLKKMGVSISGQTERDLPPLRLKGTKNLRPIHYELPIASAQVKSALMFAALQA 


180 


Query: 


185 


KGESLIVEKEQTRNHTEDMIRQFGGHLDIKDKEIRLNGGQSLVGQDIRVPGDISSAAFWI 


244 






KGES+I+EKE TRNHTEDM++QFGGHL + K+I + G Q L GQ + VPGDISSAAFW+ 




Sb j ct : 


181 


KGESVIIEKEYTRNHTEDMLQQFGGHLSvDGKKITVQGPQKLTGQKVWPGDISSAAFWL 


240 


Query: 


245 


VAGLIIPNSHIILENVGINETRTGILDWSKMGGKIKLSSVDNQVKSATLTVDYSHLQAT 


304 






VAGLI PNS ++L+NVGINETRTGI+DV+ MGGK++++ +D KSATL V+ S L+ T 




Sb j ct : 


241 


VAGLIAPNSRLVLQNVGINETRTGIIDVIRAMGGKLEITEIDPVAKSATLIVESSDLKGT 


300 


Query: 


305 


HISGAMIPRLIDELPIIALLATQAQGTTVIADAQELKVKETDRIQVVVESLKQMGADITA 


364 






I GA+IPRLIDELPIIALLATQAQG TVI DA+ELKVKETDRIQW ++L MGADIT 




Sb j ct : 


301 


EICGALIPRLIDELPIIALIATQAQGVWIKDAEELKVKETDRIQWADALNSMGADITP 


360 


Query: 


365 


TADGMIIRGNTPLHAASLDCHGDHRIGMMIAIAALLVKEGEVDLSGEEAINTSYPNFLEH 


424 






TADGMII+G + LH A ++ GDHRIGMM AIAALLV +GEV+L EAINTSYP+F + 




Sbjct: 


361 


TADGMIIKGKSALHGARVNTFGDHRIGMMTAIAALLVADGEVELDRAEAINTSYPSFFDD 


420 


Query: 


425 


LEGLVN 430 








LE L++ 




Sb j ct : 


421 


LESLIH 426 





A related DNA sequence was identified in S.pyogenes <SEQ ID 3957> which encodes the amino acid 
sequence <SEQ ID 3958>. Analysis of this protein sequence reveals the following: 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 1723 (Affirmative) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



Possible site: 36 
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>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.18 Transmembrane 240 - 256 ( 239 - 256) 



Final Results 

5 bacterial membrane Certainty=0 . 1871 (Af f irmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

10 >GP:AAD45819 GB:AF169483 5-enolpyruvylshikimate-3-phosphate synthase 

[Streptococcus pneumoniae] 
Identities = 278/426 (65%) , Positives = 346/426 (80%) 

Query: 4 MKLRTNAGPLQGTIQVPGDKSISHRAVILGAVAKGETRVKGLLKGEDVLSTIQAFRNLGV 63 
15 MKL+TN L G I+VPGDKSISHR++I G++A+GET+V +L+GEDVLST+Q FR+LGV 

Sbjct: 1 MKLKTNIRHLHGIIRVPGDKSISHRSIIFGSLAEGETKVYDILRGEDVLSTMQVFRDLGV 60 

Query: 64 RIEEKDDQLVIEGQGFQGLNAPCQTLNMGNSGTSMRLIAGLLAGQPFSVKMIGDESLSKR 123 
IE+KD + ++G G GL AP LNMGNSGTS+RLI+G+LAG F V+M GD+SLSKR 
20 Sbjct: 61 EIEDKDGVITVQGVGMAGLKAPQNALNMGNSGTSIRLISGVLAGADFEVEMFGDDSLSKR 120 

Query: 124 PMDRI VYPLKQMGVE I SGETDRQFPPLQLQGNRNLQPITYTLP I S SAQVKSAI LLAALQA 183 

PMDR+ PLK+MGV ISG+T+R PPL+L+G +NL+PI Y LPI+SAQVKSA++ AALQA 
Sbjct: 121 PMDRVTLPLKKMGVSISGQTERDLPPLRLKGTKNLRPIHYELPIASAQVKSALMFAALQA 180 

25 

Query: 184 KGTTQWEKEITRNHTEEMIQQFGGRLIVDGKRITLVGPQQLTAQEITVPGDISSAAFWL 243 

KG + ++EKE TRNHTE+M+QQFGG L VDGK+IT+ GPQ+LT Q++ VPGDISSAAFWL 
Sbjct: 181 KGESVIIEKEYTRNHTEDMLQQFGGHLSvDGKKITVQGPQKLTGQKVWPGDISSAAFWL 240 

30 Query: 244 VAGLIIPGSELLLKNVGVNPTRTGIIiEVVEKMGAQIVYEDMNKXEQVTSIRVVySNMKGT 303 

VAGLI P S L+L+NVG+N TRTGI++V+ MG ++ +++ + ++ V S++KGT 
Sbjct: 241 VAGLIAPNSRLVLQNVGINETRTGIIDVIRAMGGKLEITEIDPVAKSATLIvESSDLKGT 300 

Query: 304 IISGGLIPRLIDELPIIALLATQAQGTTCIKmQELRVKETnRIQVVTDILNSMGANIKA 363 
35 I G LIPRLIDELPIIALLATQAQG T IKDA+EL+VKETDRIQW D LNSMGA+I 

Sbjct: 301 EICGALIPRLIDELPIIALIjATQAQGvTVIKDAEELKVKETDRIQVVADALNSMGADITP 360 

Query: 364 TADGMIIKGPTVLYGANTSTYGDHRIGMMTAIA2iLLVKQGQvHLDKEEAIMTSYPTFFKD 423 
TADGMIIKG + L+GA +T+GDHRIGMMTAIAALLV G+V LD+ EAI TSYP+FF D 
40 Sbjct: 361 TADGMIIKGKSALHGARVNTFGDHRIGMMTAIAALLVADGEVELDRAEAINTSYPSFFDD 420 

Query: 424 LERLCH 429 

LE L H 
Sbjct: 421 LESLIH 426 

45 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 269/424 (63%) , Positives = 331/424 (77%) 

Query: 5 MKLLTNANTLKGTIRVPGDKSISHRAIIFGSISQGVTRIVDVLRGEDVLSTIEAFKQMGV 64 
50 MKL TNA L+GTI+VPGDKSISHRA+I G++++G TR+ +L+GEDVLSTI +AF+ +GV 

Sbjct: 4 MKLRTNAGPLQGT I QVPGDKS I SHRAVIIjGAVAKGETRVKGLLKGEDVLST IQAFRNLG V 63 

Query: 65 LIEDDGEIITIYGKGFAGLTQPNNLLDMGNSGTSMRLIAGVLAGQEFEVTMVGDNSLSKR 124 
IE+ + + I G+GF GL P L+MGNSGTSMRLIAG+LAGQ F V M+GD SLSKR 
55 Sbjct: 64 RIEEKTJDQLVIEGC^FQGLNAPCQTIjNMGWSGTSI^IAGLI^GQPFSVKMIGDESLSKR 123 

Query: 125 P^RIALPLSKMGARISGvTNRDLPPLKLCGTKKLKPIFYHLPVASAQVKSALIFAALQT 184 

PMDRI PL +MG ISG T+R PPL+LQG + L+PI Y LP++SAQVKSA++ AALQ 
Sbjct: 124 PMDRI vYPLKQMGvEISGETDRQFPPLQLQGNRNLQPITYTLPISSAQVKSAILLAALQA 183 

60 

Query: 185 KGESLIVEKEQTRNHTEDMIRQFGGHLDIKDKEIRLNGGQSLVGQDIRVPGDISSAAFWI 244 

KG + +VEKE TRNHTE+MI+QFGG L+ KILGQL Q+I VPGDISSAAFW+ 
Sbjct: 184 KGTTQWEKEITRNHTEEMIQQFGGRLIVDGKRITLVGPQQLTAQEITVPGDISSAAFWL 243 

65 Query: 245 VAGLI I PNSHI I LENVGINETRTGI LOWS KMGGKI KLSSVDNQVKSATLTVDYSHLQAT 304 

VAGLIIP S ++L+NVG+N TRTGIL+W KMG +1 ++ + +++ V YS+++ T 
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Sbjct: 244 VAGLIIPGSELLLI<WGVNPTRTGILEVVEKM(aQIVyEDMNKKEQVTSIRVVySNMKGT 303 

Query: 305 HISGMIPRLIDELPIIALIATQAQGTWIiUJAQELKVKETDRIQVVVESLKQMGtADITA 364 

ISG + 1 PRL IDELP I IALLATQAQGTT I DAQEL+VKETDRIQW + L MGA+I A 
Sbjct: 304 IISGGLIPRLIDELPIIALLATQAOGTTCIKDAQELRVKETDRIQVVTDILNSMGANIKA 363 

Query: 365 TADGMIIRGNTPLHAASLDCHGDHRIGMIAIAALLVKEGEVDLSGEEAINTSYPNFI.EH 424 

TADGMII+G T L+ A+ +GDHRIGMM AIAALLVK+G+V L EEAI TSYP F + 
Sbjct: 364 TADGMIIKGPTVLYGANTSTYGDHRIGNMTAIAALLVKQGQVHLDKEEAIMTSYPTFFKD 423 

Query: 425 LEGL 428 
LE L 

Sbjct: 424 LERL 427 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1287 

A DNA sequence (GBSxl364) was identified in S.agalactiae <SEQ ID 3959> which encodes the amino 
acid sequence <SEQ ID 3960>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -1.12 Transmembrane 6 - 22 ( 6-22) 

Final Results 

bacterial membrane Certainty=0 . 1447 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF20148 GB:AF208390 actinin-like protein [Entamoeba 
histolytica] 

Identities = 62/236 (26%) , Positives = 107/236 (45%) , Gaps = 38/236 (16%) 

NYNSTNSSNPESMLFYEKQLKTWLSTH KNYYLDYK- - VTPI YQNNELI PRKIELK- 196 

N N + N + + L W+++ N+ D+K V + + +1+ + 

NANQQKNVNAKEEVVENNALLDWVNSFGLNVSNFSSDWKDGVALVKLTEAVSAGQIKFEQ 175 

YVGIDKTGKLLPIFIGNKSTQDQFGI STVTLENTSPNATIDYLSGKAQN 245 

+ G+D T ++ K +QF I + E P + + Y+S + 

FSGLDNTQMVIDC QKLAYEQFKI PI LMDVKDLVCERPDPKS IMTYVSVYKERYEQLL 232 

TVLSAKEQRKL IAKHEEEKRLAEK KVEEEKAAAETQKKL - EEEQARLAAEAQ - RK 298 

KE+++ IA+ E+E++ E+ + E+E+ A E Q++L EEQ RLA E Q RK 



QKEEQ RLA E Q++++ QE+ +Q +P Q + + AA W 

QKEEQERLAREEQERKQREEQERLNQ QQPTSQQLTFFSVQAAADAW 338 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3961> which encodes the amino acid 
sequence <SEQ ID 3962>. Analysis of this protein sequence reveals the following:. 

Possible site: 41 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



Query: 


144 


Sb j ct : 


116 


Query: 


197 


Sb j ct : 


176 


Query: 


246 


Sb j ct : 


233 


Query: 


299 


Sbjct: 


293 
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The protein has homology with the following sequences in the databases: 

>GP:CAA03161 GB:A49208 unnamed protein product [Streptococcus 
pyogenes] 

Identities = 54/222 (24%) , Positives = 93/222 (41%) , Gaps = 39/222 (17%) 

Query: 44 HYKNTVSSKLLP--FTANYQLQLGELDNLNRA TFSHIQLQDRHETKDVRTKINYD 96 

+YK +S++ P F + +LD LR T ++++ + + KN + 

Sbjct: 76 YYKTLGTSQITPALFPKAGDILYSKLDELGRTRTARGTLTYANVEGSYGVRQSFGK-NQN 134 

Query: 97 PVGWHN YQFPYGDG-SKSSWVT^GHLVGYQFCGI^EPRNLVMTAWLNTGAY 149 

P GW Y+ + +G S NR HL+ G + + + A T 

Sbjct: 135 PAGWTGNPNHVKYKIEWLNGLSYVGDFWNRSHLIADSLGG DALRWAVTGTRTQ 188 

Query: 150 SGANDSNPEGMLYYF^LDSWIALHPDFWLDYKVTPIYSGNEWPRQIELQYVGIDSSGE 209 

+ GM Y E R WL + D +L Y+V PIY+ +E++PR + 
Sbjct: 189 NVGGRDQKGGMRYTEQRAQEWLEANRDGyi/YYEVAPIYNADELIPRAV 236 

Query: 210 LLTIRLNSNKESIDENGVTTVILENSAPNINLDYLNGTATPK 251 

+ + S+ +I+E V++ N+A ++Y NGT T K 

Sbjct: 237 --WSMQSSDNTINEK VLVYNTANGYTINYHNGTPTQK 272 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 117/245 (47%) , Positives = 166/245 (67%) , Gaps = 4/245 (1%) 

Query: 2 KRKQFIKLGIATLLTVISLYTPINIATNHTTENI VTAQEY- -KTKENGTLPFKHKRQLVL 59 

K+K + + LL++ ++ A T N+ A + T + LPF QL L 

Sbjct: 5 KQKASLLTAVLLLLSLSITTITVDAARTOTYPNVSHAOTHYKNWSSKLLPFTANYQLQL 64 

Query: 60 GELDDKGS^TFAHIQLKVKDEPKKKRVKRLKTTPVGWHNFKFYY 119 

GELD+ RATF+HIQL+ + E K R K + PVGWHN++F Y DG++ +W+M+RG L+ 
Sbjct: 65 GELDNLNRATFSHIQLQDRHETKDWTK-INYDPVGWHireQFPYGDGSKSSWVMNRGHLV 123 

Query: 120 CHQFSGIMffiRKNLVPMTNWIOTGNYNSTNSSNPES 179 

+QF GliN+E +NLV MT WLNTG Y+ N SNPE ML+YE +L +WL+ H +++LDYKV 
Sbjct: 124 GYQFCGLNDEPRISttjVAMTAWIJSrrGAYSGAtTOSOT 183 

Query: 180 TPI YQNNELI PRKIELKYVGIDKTGKLLP I F I - GNKSTQDQFGISTVTLENTSPNATIDY 238 

TPIY NE++PR+IEL+YVGID +G+LL I + NK + D+ G++TV LEN++PN +DY 
Sbjct: 184 TPIYSGNEWPRQIELQYVGIDSSGELLTIRLNSNKESIDENGVTTVILENSAPNINLDY 243 

Query: 239 LSGKA 243 

L+G A 
Sbjct: 244 IiNGTA 248 

A related DNA sequence was identified in S.pyogenes <SEQ ID 7263> which encodes amino acid sequence 
<SEQ ID 7264>. An alignment of the GAS and GBS sequences follows: 

Score = 58.9 bits (140), Expect = 2e-ll 

Identities = 34/103 (33%) , Positives = 55/103 (53%) , Gaps = 1/103 (0%) 

Query: 1 MPFKTNLKAGI LLYAMFMAS I FLL VLQ VYIiSQVTALHKEYQAQTDYVKARLI AE I VYQD - 59 

M K LKAGILL A+ +A++F LVLQ YL+++ A ++Y +Q + KA L A++ Y+ 
Sbjct: 1 MILKZKLKAGILLQAIVlAAVFTLVLQFYIiARILATERQYHSQIEASKAYLTAQLAYKTI 60 

Query: 60 HRYKASNPVFFKGGQVICRERKERWMLIVKLDQQRQYQFEYLK 102 

S +F GG + + V LD+ Y ++ + 

Sbjct: 61 EGDS I SGKCYFTGGYAS YLQEGNYLQVKVTIiDKGGNYNHKFYR 103 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1288 

A DNA sequence (GBSxl365) was identified in S.agalactiae <SEQ ID 3963> which encodes the amino 
acid sequence <SEQ ID 3964>. This protein is predicted to be enolase (eno). Analysis of this protein 
sequence reveals the following: 

Possible site: 43 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3025 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA81815 GB:AB029313 enolase [Streptococcus intermedius] 
Identities = 396/435 (91%) , Positives = 414/435 (95%) , Gaps = 1/435 (0%) 



Query: 


1 


MSIITDVYAREVLDSRGNPTLEVEVYTESGAFGRGMVPSGASTGEHEAVELRDGDKSRYG 


60 






MSIITDVYAREVLDSRGNPTLEVEVYTESGAFGRGMVPSGASTGEHEAVELRDGDKSRYG 




Sbjct: 


1 


MSIITDVYAREVLDSRGNPTLEVEVYTESGAFGRGMVPSGASTGEHEAVELRDGDKSRYG 


60 


Query: 


61 


GLGTQKAVDNVNNVIAEAI IGYDVRDQQAIDRAMIALDGTPNKGKLGANAILGVS IAVAR 


120 






GLGTQKAVDNVNN+ IAEA+ IGYDVRDQQAIDRAMIALDGTPNKGKLGANAILGVS IAVAR 




Sb j ct : 


61 


GLGTQKAVDNVNNIIAEAVIGYDVRDQQAIDRAMIALDGTPNKGKLGANAILGVSIAVAR 


120 


Query: 


121 


AAADYLEVPLYSYLGGFNTKVLPTPMMNIINGGSHSDAPIAFQEFMIMPVGAPTFKEALR 


180 






AAADYLE+PLYSYLGGFNTKVLPTPMMNI INGGSHSDAPIAFQEFMI +P GAPTFKEALR 




Sb j ct : 


121 


AAADYLEIPLYSYLGGFNTKVLPTPMMNIINGGSHSDAPIAFQEFMIVPAGAPTFKEALR 


180 


Query: 


181 


WGAEVFHALKKILKERGLETAVGDEGGFAPKFEGTEDGVETILKAIEAAGYEAGENGIMI 


240 






WGAE+FHALKKILK RGL TAVGDEGGFAP+F+GTEDGVETIL AIEAAGY G++ + + 




Sb j ct : 


181 


WG^IFHALKKILKBRGLATAVGDEGGFAPRFMsTEDGTO^ 


239 


Query: 


241 


GFDCaSSEFYDAERKVYDYSKFEGEGGAVRTAAEQIDYLEELVNKYPIITIEDGMDENDW 


300 






GFDCASSEFYD ERKVYDY+KFEGEG AVRTA EQIDYLEELVNKYPI ITIEDGMDENDW 




Sb j ct : 


240 


GFDCASSEFYDKERKVYDYTKFEGEGAAVRTADEQIDYLEELVNKYPIITIEDGMDENDW 


299 


Query: 


301 


DGWKALTERLGGRVQLVGDDFFVTNTDYLARGIKEEAANSILIKVNQIGTLTETFEAIEM 


360 






DGWK LTERLG +VQ VGDDFFVTNT YL +GI E ANSILIKVNQIGTLTETF+AIEM 




Sb j ct : 


300 


DGWKKLTERLGKKVQPVGDDFFVTNTSYIjEKGINEACTiNSILIKvNQIGTLTETFDAIEM 


359 


Query: 


361 


AKEAGYTAWSHRSGETEDSTIADIAVATNAGQIKTGSLSRTDRIAKYNQLLRIEDQLGE 


420 






AKEAGYTAWSHRSGETEDSTIADIAVA NAGQIKTGSLSRTDRIAKYNQLLRIEDQLGE 




Sb j ct : 


360 


AKEAGYTAWSHRSGETEDSTIADIAVAANAGQIKTGSLSRTDRIAKYNQLLRIEDQLGE 


419 


Query: 


421 


VAQYKGIKSFYNLKK 435 








VA+Y+G+KSFYNL K 




Sb j ct : 


420 


VAEYRGLKSFYNLSK 434 





Proteins in the glycolysis/gluconeogenesis pathway have been experimentally detected on the surface of 
Streptococci. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3965> which encodes the amino acid 
sequence <SEQ ID 3966>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty^ 0 . 3025 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty^O . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:BAA81815 GB:AB029313 enolase [Streptococcus intermedius] 
Identities = 396/435 (91%) , Positives = 415/435 (95%) , Gaps = 1/435 (0%) 

5 

Query: 1 MS 1 1 TD VYARE VLDSRGNPTLEVE VYTESGAFGRGMVPSGASTGEHEAVELRDGDKSRYL 60 

MSIITDVYAREVLDSRGNPTLEVEVYTESGAFGRGMVPSGASTGEHEAVELRDGDKSRY 
Sbjct: 1 MSIITDWAREVLDSRGNPTLEVEVYTESGAFGRGMVPSGASTGEHEAVBLRDGDKSRYG 60 

10 Query: 61 GLGTQKAVDNVNNIIAEAIIGYDVRDQQAIDRAMIALDGTPNKGKLGANAILGVSIAVAR 120 

GLGTQKAVDNVNNIIAEA+IGYDVRDQQAIDRAMIALDGTPNKGKXGANAILGVSIAVAR 
Sbjct: 61 GLGTQKAVDNVNNI I AEAVIGYDVRDQQAI DRAM I ALDGTPNKGKLGANAI LGVS IAVAR 120 

Query: 121 AAADYLEVPLYTYLGGFNTKVLPTPMMNI INGGSHSDAP IAFQEFMIMPVGAPTFKEGLR 180 
15 AAADYLE+PLY+YLGGFNTKVLPTPMMNIINGGSHSDAPIAFQEFMI+P GAPTFKE LR 

Sbjct: 121 AAADYLEIPLYSYLGGFNTKVLPTPMMNI INGGSHSDAP IAFQEFMIVPAGAPTFKEALR 180 

Query: 181 WGAEVFHALKKILKERGLVTAVGDEGGFAPKFEGTEDGVETILKAIEAAGYEAGENGIMI 240 
WGAE+FHALKKIIiK RGL TAVGDEGGFAP+F+GTEDGVETIL AIEAAGY G++ + + 
20 Sbjct: 181 WGAEI FHALKKI LKSRGLATAVGDEGGFAPRFDGTEDGVETI LAAIEAAGYVPGKD - VFL 239 

Query: 241 GFDCASSEFYDKERKvYDYTKFEGEGAAWTSAEQVDYLEELVNKYPIITIEDGMDENDW 300 

GFDCASSEFYDKERKVYDYTKFEGEGAAVRT+ EQ+DYLEELVNKYPI ITIEDGMDENDW 
Sbjct: 240 GFDCASSEFYDKERKVYDYTKFEGEGAAVRTADEQIDYLEELVNKYPIITIEDGMDENDW 299 

25 

Query: 301 DGWKVLTERLGKRVQLVGDDFF\OTITEYLARGIKENAANSILIKvNQIGTLTETFEAIEM 360 

DGWK LTERLGK+VQ VGDDFFVTNT YL +GI E ANSILIKVNQIGTLTETF+AIEM 
Sbjct: 300 DGWKKLTERLGKCTQPVGDDFFVTNTSYLEKGINEACANSIIjIKvNQIGTLTETFDAIEM 359 

30 Query: 361 AKEAGYTAWSHRSGETEDSTIADIAVATNMQIKTCMIiSRTDRIAKYWQLIiRIEDQLGE 420 

AKEAGYTAWSHRSGETEDSTIADIAVA nagqiktgslsrtdriakynqllriedqlge 

Sbjct: 360 AKEAGYTAWSHRSGETEDSTIADIAVAANAGQIKTGSLSRTDRIAKYNQLLRIEDQLGE 419 

Query: 421 VAQYKGIKSFYNLKK 435 
35 VA+Y+G+KSFYNL K 

Sbjct: 420 VAEYRGLKSFYNLSK 434 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 421/435 (96%) , Positives = 427/435 (97%) 

40 

Query: 1 MSIITDVYAREVLDSRGNPTLEVEVYTESGAFGRGMVPSGASTGEHEAVELRDGDKSRYG 60 

MSIITDVYAREVLDSRGNPTLEvEVYTESGAFGRGMVPSGASTGEHEAVELRDGDKSRY 
Sbjct: 1 MSIITDVYAREVLDSRGNPTLEVEVYTESGAFGRGMVPSGASTGEHEAVELRDGDKSRYL 60 

45 Query: 61 GLGTQKAVDNVNNVIAEAIIGYDVRDQQAIDRAMIALDGTPNKGKLGANAILGVSIAVAR 120 

GLGTQKAVDNVNN+IAEAIIGYDvRDQQAIDRAMIALDGTPNKGKLGANAILGVSIAVAR 
Sbjct: 61 GLGTQKAvDNVNNIIAEAIIGYDVRDQQAIDRAMIALDGTPNKGKLGANAILGVSIAVAR 120 

Query: 121 AAADYLEVPLYSYLGGFNTKVLPTPMMNIINGGSHSDAPIAFQEFMIMPVGAPTFKEALR 180 
50 AAADYLEVPLY+YLGGFNTKVLPTPMMNI INGGSHSDAP IAFQEFMIMPVGAPTFKE LR 

Sbjct: 121 AAADYLEVPLYTYLGGFNTKVLPTPMMNI INGGSHSDAPIAFQEFMIMPVGAPTFKEGLR 180 

Query: 181 WGAEVFHALKKILKERGLETAVGDEGGFAPKFEGTEDGVETILKAIEAAGYEAGENGIMI 240 
WGAEVFHALKKI LKERGL TAVGDEGGFAPKFEGTEDGVETILKAIEAAGYEAGENGIMI 
55 Sbjct: 181 WGAEVFHALKKILKERGLVTAVGDEGGFAPKFEGTEDGVETILKAIEAAGYEAGENGIMI 240 

Query: 241 GFDCASSEFYDAERKVYDYSKFEGEGGAVRTAAEQIDYLEELVNKYPI ITIEDGMDENDW 300 

GFDCASSEFYD ERKVYDY+KFEGEG AVRT+AEQ+DYLEELVNKYPIITIEDGMDENDW 
Sbjct: 241 GFDCASSEFYDKERKVYDYTKFEGEGAAVRTSAEQVDYLEELvNKYPI ITIEDGMDENDW 300 

60 

Query: 301 DGWKALTERLGGRVQLVGDDFFVTNTDYLARGIKEEAANSILIKVNQIGTLTETFEAIEM 360 

DGWK LTERLG RVQLVGDDFFVTNT+YLARGIKE AANSILIKVNQIGTLTETFEAIEM 
Sbjct: 301 DGWKVLTERLGKRVQLVGDDFFVTNTEYLARGI KENAANS I LIKVNQIGTLTETFEAIEM 360 

65 Query: 361 AKEAGYTAWSHRSGETEDSTIADIAVATNAGQIKTGSLSRTDRIAKYNQLLRIEDQLGE 420 

AKEAGYTAWSHRSGETEDSTIADIAVATNAGQIKTGSLSRTDRIAKYNQLLRIEDQLGE 
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Sbjct: 361 AKEAGYTAWSHRSGETEDSTIADIAVRTNAGQIKTGSLSRTDRIAKYNQLLRIEDQLGE 420 

Query: 421 VAQYKGIKSFYNLKK 435 
VAQYKGI KSFYNLKK 
5 Sbjct: 421 VAQYKGIKSFYNLKK 435 

SEQ ID 3964 (GBS311) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 45 (lane 3; MW 51kDa). 

GBS311-His was purified as shown in Figure 203, lane 10. 

10 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1289 

A DNA sequence (GBSxl366) was identified in S.agalactiae <SEQ ID 3967> which encodes the amino 

acid sequence <SEQ ID 3968>. Analysis of this protein sequence reveals the following: 

15 Possible site: 60 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1998 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Mot Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

25 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1290 

A DNA sequence (GBSxl367) was identified in S.agalactiae <SEQ ID 3969> which encodes the amino 
acid sequence <SEQ ID 3970>. This protein is predicted to be di-/tripeptide transporter. Analysis of this 
30 protein sequence reveals the following: 

Possible site: 54 

»> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane Certainty=0. 6731 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9395> which encodes amino acid sequence <SEQ ID 9396> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:CAB12175 GB:Z99106 similar to di-tripeptide ABC transporter 
(membrane protein) [Bacillus subtilis] 
Identities = 175/359 (48%) , Positives = 254/359 (70%) , Gaps = 9/359 (2%) 

5 Query: 1 MVGNLYGENDSRRDAGFSIFVFGINLGAFISPIWGYLGQEVNFHLGFSLAAIGMFFGLL 60 

+VG+DY + D RRD+GFSIF GINLG ++P++VG LGQ+ N+HLGF AA+GM GL+ 
Sbjct: 142 WGDLYTKEDPRRDSGFSIFYMGINLGGLIAPLIVGTLGQKYNYHLGFGAAAVGMLLGIiI 201 

Query: 61 QYTLDGKKYLTEESLRPNDPLSPEEKSSLYKKVGLILIGIVIVLILLHLMHMLTIEVIID 120 
10 + L KK L +PLS +KS++ +G+I++ I +++ + +LTI+ ID 

Sbjct: 202 VFPLTRKKNLGLAGSNVPNPLS - - KKSAIGTGIGVI IVAIAVI I SVQ - - TGVLTI KRF ID 257 

Query: 121 IFSIIAIAIPIIYFIKILSSKKISSVERSRVWAYIPLFIASILFWSIEEQGSWLALFAD 180 
+ SI+ I IP+IYFI + +SKK E+SR+ AY+PLFI +++FW+I+EQG+ +LA++AD 
15 Sbjct: 258 LVSILGILIPVIYFIIMFTSKKADKTEKSRLAAYVPLFIGAVMFWAIQEQGATILAVYAD 317 

Query: 181 EQTKLYLNFFGHHINFPSSYFQSMNPLFIMLYVPFFAWLWAKWGSKQPSSPKKFAYGLFF 240 

E+ +L L F SS+FQS+NPLF++++ P FAWLW K G +QPS+P KF+ G+ 

Sbjct: 318 ERIRLSLGGF ELQSSWFQSLNPLFWI FAPI FAWLWMKLGKRQPSTPVKFS IGI IL 373 

20 

Query: 241 AGASFLWMMLPGLLFGVNAKVSPLWLTMSWAIVIVGEMLISPVGLSATSKLAPKAFQAQM 300 

AG SF+ M+ P + G A VSPLWL +S+ +V++GE+ +SPVGLS T+KLAP AF AQ 
Sbjct: 374 AGLSFI IMVFPAMQ - GKEALVS PLWLVLS FLLWLGELCLS PVGLS VTTKLAPAAFSAQT 432 

25 Query: 301 MSIWFLSNAAAQAINAQIVKLYTPDTQTLYYGWGGITWFGFILLFYVPRIEKLMSGV 359 

MS+WFL+NAAAQAINAQ+ L+ +T+Y+G +G I++V G ILL P I++ M GV 
Sbjct: 433 MSMWFLTNAAAQAINAQ VAGLFDKI PETMYFGT I GLI S I VLGGILLLLS PVI KRAMKGV 491 

No corresponding DNA sequence was identified in S.pyogenes. 

30 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1291 

A DNA sequence (GBSxl369) was identified in S.agalactiae <SEQ ID 3971> which encodes the amino 
acid sequence <SEQ ID 3972>. Analysis of this protein sequence reveals the following: 

35 Possible site: 37 

>>> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

45 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1292 

A DNA sequence (GBSxl370) was identified in S.agalactiae <SEQ ID 3973> which encodes the amino 
acid sequence <SEQ ID 3974>. Analysis of this protein sequence reveals the following: 

50 Possible site: 30 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2485 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

5 >GP:AAF61315 GB:U96166 unknown [Streptococcus cristatus] 

Identities = 181/442 (40%), Positives = 270/442 (60%), Gaps = 2/442 (0%) 

Query: 1 MINLFDSYTQSSWDLHFSLIKSGYINPTIAIiTODGFLPDDVTSPYLYYTGFAKTGAGRPL 60 
MI LFD Y Q+S+DL SL +G P + + DDG+L DV SPY Y+TG T GRP+ 
10 Sbjct: 1 MICLFDRYDQASFDLLRSLKATGLDCPVVWQDDGYLSPDVESPYSYFTGDLDTPEGRPI 60 

Query: 61 YYNELRVPDTWKI IGFSSGADIVDLGVKKGRI IYANPNHKRLIKEVDWFDEQGRVILKDR 120 

Y+N + P WEI + +I+D+G K+ I Y P H+R ++ V+W D +G+V D 
Sbjct: 61 YENLVPKPHLWEIRSSNVNGEILDMGKKRftNIFYRQPTHERRVRAVEWLDTEGQVRAADI 120 

15 

Query: 121 FNKFGFCFAQTFYNADGQAIQTSYYNKDRQEVISENHMTGDYILNDNNQFKVFKSKVEFV 180 

+N+ G FAQ Y+ + T Y+++ VI ENH+TGD IL + +FKSK EFV 
Sbjct: 121 YNRKGRLFAQITYDQTQRPTHTRYFDQSNVWIMENHLTGDIILTLEGKRHIFKSKQEFV 180 

20 Query: 181 INYLQEAKFNLDRIFYNSLSTPFLVSFYL--NRLESKDVLFWQEPLVDDIPGNMRLLLNN 238 

+ YLQ ++ DRI YNSL+TPFLV++ L ++DVLFWQEP+ + +PGNM++ + 

Sbjct: 181 VFYLQYRGYDTDRIIYNSLATPFLVAYALRPKNGRAEDVLFWQEPIGEALPGNMKVAMKM 240 

Query: 239 PSPNTKIVIQSYEAYANAMRLLTDEEQKQVSFLGFMYPLKETEKLHNQALILTNSDQIEA 298 
25 P N +1 +Q + Y L T EE+ +G++Y + ++ +ALILTNSDQ+E 

Sbjct: 241 PHRNIRIAVQDRQWEKIQSIATPEEKVYFHNIGYIYDYQRLNNMNPEALILTNSDQLEQ 300 

Query: 299 LESLVTSLPNLTFNIGALTEMSSDLMNFGKYDNVVLYPNITTNQIQYLSNICAFYLDINH 358 
+E L+T LPN+ F+IGA+TEMS LM +Y NV LYPNI ++ L C YLDIN 
30 Sbjct: 301 IEQLLTQLPin^FHIGAITE^GHLMGLNRYPOTSLYPNIRPAKVAELFERCDLYLDINI 360 

Query: 359 HNE I LSAVRSAFEHQQLI FAFEETSHQIRFVSPKNI FPKKDI FTFI SHLQPLIGNKCNIE 418 

+EIL+A R+AFE+ LI +F T H RF++ +1+ +++ + +Q + + +E 
Sbjct: 361 SDEILNACRTAFENNMLILSFTNTCHSRRFIADDHIYAPENVSGMVDKIQSALAHSSEME 420 

35 

Query: 419 KALKQQLEDCHVSSSTQYQSVI 440 

AL +Q + + +S QY+++I 
Sbjct: 421 AALTRQKQAANQASLEQYKAI I 442 

40 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1293 

A DNA sequence (GBSxl371) was identified in S.agalactiae <SEQ ID 3975> which encodes the amino 
45 acid sequence <SEQ ID 3976>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.06 Transmembrane 405 - 421 ( 404 - 422) 

50 Final Results 

bacterial membrane Certainty=0. 1022 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

55 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA94320 GB:AB033763 hypothetical protein [Staphylococcus 
aureus] 

Identities = 66/195 (33%) , Positives = 99/195 (49%) , Gaps = 9/195 (4%) 
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Query: 


259 


NYyDYQFTNANRFDFFITSTDKQTELLF^XJFKQFTNHNPRI I TI PVGS ID NLKMPM 


314 








N Y + F N NR+ I ST +Q + N+ + TIPVG ID NLK 






Sb j ct : 


15 


NTYKHVFNNLNRYSGI I VSTKQQ QLD I SARINNEIPVHT I PVGYIDEHFTNLKRNN 


70 


5 


Query: 


315 


DNRRPYSILTASRIASEKHVDWLVRAVIRIREILPEVTFDIYGSGGEEEKIRNIINAftNA 


374 








+ I++ +R + EK ++ + V ++ + P + +YG G EEEK + +1 N 






Sbjct: 


71 


HSINNNKIISVARYSPEKQIjNHQIELVSKLIKEFPNIRLHLYGFGKEEEKYKQLITEYNL 


130 


10 


Query: 


375 


TEYIRLMG-HKNLSNVYQNYELVLTASKSEGFGLTLLEAIGAGLPLIGFDVRYGNQTFIK 


433 






+ L G +NLS Q+ + L S EGF L LLE I G+P +G++ +YG I 






Sb j ct : 


131 


ENNVFLRGFRRNLSAEIQDAYMSLITSrMEGFNLGLLETITEGIPPVGYNSKYGPSELIL 


190 




Query: 


434 


DGENGYLIPRFDMDD 448 










+ ENGYLI + D D+ 




15 


Sb j ct : 


191 


NNENGYL INKNDKDE 205 





SEQ ID 3976 (GBS426) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 80 (lane 4; MW 58.8kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 173 (lane 3; MW 84kDa). 

20 GBS426-GST was purified as shown in Figure 220, lane 5. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1294 

A DNA sequence (GBSxl372) was identified in S.agalactiae <SEQ ID 3977> which encodes the amino 
25 acid sequence <SEQ ID 3978>. This protein is predicted to be preprotein translocase seca subunit (secA). 
Analysis of this protein sequence reveals the following: 

Possible site: 42 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.69 Transmembrane 75 - 91 ( 75 - 91) 

30 

Final Results 

bacterial membrane Certainty=0 . 1277 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < succ> 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

35 . 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC44957 GB:U56901 involved in protein export [Bacillus subtilis] 
Identities = 336/794 (42%) , Positives = 506/794 (63%) , Gaps = 29/794 (3%) 



Query: 
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NSLFSLDKKRLKKLQRTIiNTINSLKGQMATLSNEELQAKTTEFRKRLVNGETLDDICAEA 


64 






N +F K+ L + ++ N I++++G LS++ L+ KT EF++RL G T DD+ EA 




Sb j ct : 


6 


NKMFDPTKRTLNRYEKIANDIDAIRGDYENLSDDALKHKTIEFKERLEKGATTDDLLVEA 


65 


Query: 


65 


FAVVREADERVLGLFPYDVQVIGGLVLHQGNTAEMKTGEGKTLTATMPLYLNALEGKGAM 


124 






FAWREA RV G+FP+ VQ++GG+ LH GN AEMKTGEGKTLT+T+P+YLNAL GKG 




Sbjct: 


66 


FAvWEASRRVTGMFPFKVQLMGGVALHDGNIAEMKTGEGKTLTSTLPVYLNALTGKGVH 


125 


Query: 


125 


LLTNNSYIAIRDAEEMGKVYRFLGLSVGVGVSDNEEEDRDAATKRAVYSSDIVYSTSSAL 


184 






++T N YLA RDAE+MGK++ FLGL+VG+ ++ +++ KR Y++DI YST++ L 




Sbj ct : 


126 


WTVNE YLASRDAEQMGKI FEFLGLTVGLNLNSMSKDE KREAYAADITYSTNNEL 


180 


Query: 


185 


GFDYLIDNLASSKSQKYMPKLHYAIVDEADAVLLDMAQTPLVISGSPRVQSNLYKIADEL 


244 






GFDYL DN+ K Q LH+A++DE D++L+D A+TPL+ISG + LY A+ 




Sbj ct : 


181 


GFDYLRDNMVLYKEQMVQRPLHFAVIDEVDSILIDEARTPLIISGQAAKSTKLYVQANAF 


240 


Query: 


245 


ILS FEEQ VDYYFDKERQE VWI KNQG VREAERYFRI PHFYKQSNREL VRHLNLSLKAHKLF 


304 






+ + + + DY+D + + V+ +G+ +AE+ F I + + + L H+N +LKAH 




Sbj ct : 


241 


TOTLKAEKDYTYDIKTKAVQLTEEGMTKaEKAFGIDNLFDVTOWALNHHINQALKAHVAM 


300 
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Query: 305 ERGKDYVVDDGEIKLLDATOGRVLEGTKLQGGVHQfl.IEQKEHLNVTPESRAMASITYQroj 364 

++ DYW+DG++ ++D+ GR+++G + G+HQAIE KE h + ES +A+IT+QN 
Sbjct: 301 QKDVDYWEDGQWIVDSFTGRLMKGRRYSEGLHQAIEAKEGLE1QNESMTLATITFQNY 360 

5 

Query: 365 FRMFTKIAGMTGTGKTAEKEFIEVYDMEWRIPTNSPVRRIDYPDKIYTTLPEKIHATIE 424 

FRM+ KLAGMTGT KT E+EF +Y+M+W IPTN PV R D PD IY T+ K A E 
Sbjct: 361 FRMYEKLAGMTGTAKTEEEEFRNIYNMQWTIPTNRPVVRDDRPDLIYRTMEGKFKAVAE 420 

10 Query: 425 FVKQVHDTGQPILLVAGSVRMSELFSELLIiSGIPHSLLNAQSAVKEaQMIAEAGQKGAV 484 

V Q + TGQP+L+ +V SEL S+LL GIPH +LNA++ +EAQ+1 EAGQKGAV 
Sbjct: 421 DVAQRYMTGQPVLVGTVAVETSELISKLLKNKGIPHQTCNAKNHEREAQIIEEAGQKGAV 480 

Query: 485 TVATNMAGRGTDIKLGKGVSELGGLAVIGTERMKSQRMDLQLRGRSGRQGDIGFSQFFVS 544 
15 T+ATNMAGRGTDIKLG+GV ELGGIAV+GTER +S+R+D QLRGRSGRQGD G +QF++S 

Sbjct: 481 TIATNMAGRGTDIKLGEGVKELGGLAWGTERHESRRIDNQLRGRSGRQGDPGITQFYLS 540 

Query: 545 FEDDLMIESGPKWAQDYFRKHRDKVNPEKPKALGQRRFQKLFQQTQEASDGKGESARSQT 604 
ED+LM G+ D++ + ++ + +Q+ +G +R Q 

20 Sbjct: 541 MEDELMRRFGAERTMAML DRFGMDDSTPIQSKMVSRAVESSQKRVEGNNFDSRKQL 596 

Query: 605 IEFDSSVQLQREYVYRERNALINGESGHFSPRQIIDTVISSFI AYLDGEVEKEEL 659 

+4-+D ++ QRE +Y++R +1+ E + R+I++ +1 S + AY E EE 

Sbjct: 597 LQYDDVLRQQREVIYKQRFEVIDSE NLREIVENMIKSSLERAIAAYTPREELPEE- 651 

25 

Query: 660 IFEVNRFI-FDNMSYNLQGISKEMSL--EEIKNYLFKIADEILREKHNLLGDSFG 711 

++++ + N +Y +G ++ + +E L I D 1+ K+N + FG 
Sbjct: 652 -WKLDGLVDLINTTYLDEGALEKSDIFGKEPDEMLELIMDRII-TKYNEKEEQFGKEQMR 709 

30 Query: 712 DFERTAALKAIDFAWIEEVDYLQQLRTVATARQT^RNPVFEYHKEAYKSYNIMKKEIRE 771 

+FE+ L+A+D W++ +D + QLR R AQ NP+ EY E + + M + I • + 

' Sbjct: 710 EFEKVIVLRAVDSKWMDHIDAMDQLRQGIHliRAYAQTNPLREYQMEGFAMFEHMIESIED 7 6 9 

Query: 772 QTFRNLLLSEVSFN 785 
35 + + ++ +E+ N 

Sbjct: 770 EVAKFVMKAEIENN 783 

There is also homology to SEQ ID 3620. 

SEQ ID 3978 (GBS425) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
40 extract is shown in Figure 80 (lane 3; MW 91kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 173 (lane 2; MW 116kDa). 

GBS425-GST was purified as shown in Figure 220, lane 4. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

45 Example 1295 

A DNA sequence (GBSxl373) was identified in S.agalactiae <SEQ ID 3979> which encodes the amino 
acid sequence <SEQ ID 3980>. Analysis of this protein sequence reveals the following: 



50 



55 



Possible site: 43 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3 82 7 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certaint;y=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
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No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1296 

5 A DNA sequence (GBSxl374) was identified in S.agalactiae <SEQ ID 3981> which encodes the amino 
acid sequence <SEQ ID 3982>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 .2683 (Affirmative) < succ> 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

15 A related GBS nucleic acid sequence <SEQ ID 10001> which encodes amino acid sequence <SEQ ID 
10002> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
20 vaccines or diagnostics. 

Example 1297 

A DNA sequence (GBSxl375) was identified in S.agalactiae <SEQ ID 3983> which encodes the amino 

acid sequence <SEQ ID 3984>. Analysis of this protein sequence reveals the following: 

, Possible site: 31 
25 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5410 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

Example 1298 

A DNA sequence (GBSxl376) was identified in S.agalactiae <SEQ ID 3985> which encodes the amino 
acid sequence <SEQ ID 3986>. This protein is predicted to be preprotein translocase secy subunit. Analysis 
of this protein sequence reveals the following: 

40 Possible site: 59 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -9.92 Transmembrane 287 - 303 ( 278 - 309) 

INTEGRAL Likelihood = -9.08 Transmembrane 191 - 207 ( 186 - 210) 

INTEGRAL Likelihood = -8.44 Transmembrane 104 - 120 ( 101 - 123) 

45 INTEGRAL Likelihood = -8.23 Transmembrane 11 - 27 ( 9 - 41) 
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Final Results 

bacterial membrane Certainty=0. 4970 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF30659 GB:AE002122 preprotein translocase [Ureaplasma urealyticum] 
Identities = 105/422 (24%) , Positives = 213/422 (49%) , Gaps = 49/422 (11%) 

Query: 2 KLLYIFEKNIILRKILITFSLIIIFLLGRYVPIPGVLISAYKGQDNNFATLYSTvTGGNL 61 

+LL IF+ +L +++T S++I+F +G +P+P + ++ G +F ++ + + GG L 
Sbjct: 13 QLLMIFKNKKVLVALIVTLSILILFRIGSVIPMPYIKLNGNFGNQGSFFSIINLLGGGGL 72 

Query: 62 SQVGVFSLGIGPMMTTMILLRLFT IGKYSSGVSQKVQQFRQNWMLVIAI I 112 

SQ +F++GIGP +T I+++L + + K +K++ + ++ L +A++ 

Sbjct: 73 SQFSLFAIGIGPYITAQIIMQLLSSELVPPLAKLSKSGERGRKKIEVITR-IITLPLAVM 131 

Query: 113 QGLAITISFQYHNGFSL TKLLLATMI- -LVTGAYIISWIGNLNAEYGFG- 159 

Q + I NGF + L T I +VGYI ++ +L ++ G G 

Sbjct: 132 QAVIIINLMTRANGFISIVSNAPFAIGSPLFYVTY1FLMVGGTYISLFLADLISKKGVGN 191 

Query: 160 GMTILVWGMLVGQFNNIPLIFELF QDGYQLAIILFLLWTLVAMYLMITFERSE 213 

G+T+L++ G++ FN+ IF + + IL++IH- ++ + ++ S 

Sbjct: 192 GITLLILTGIVASLFNHFIAIFSNLGSLTSSKVSQIIGFILYILFYIMILIGWFVNNST 251 

Query: 214 YRIPVMRTS IHNRLvDMYMPIKVlIASGGMAFMYVYTIjLMFPQYIIILLRSIFPT 268 

+IPV +T H +L ■ ++PIK+ +G M ++ ++L P + L 
Sbjct: 252 RKI PVQQTGQALI LDHEKL PFLP I KIMTAGVMP VI FAS S VLAI PAQVAEFLDK Q 305 

Query: 269 NPDITSYNDYFSLSSIQGWIYMILMLVLSVaFTFVNIDPTKISEAMRESGDFIPNYRPG 328 

+ ++YF + S G+ IY++L+L+ + F++V ++P K++E ++++G FIP + G 

Sbjct: 306 SMGYYVIHNYFIVDSWTGIAIYVVLILLFTFFFSYVQLNPPKMAEDIKKAGRFIPGVQVG 365 

Query: 329 KETQSYLSKICYLFGTFSGFFMAFLGGVPLLFALGNDDLR TVSSMTGIFMM 379 

+T+ +++K+ Y +AFL +P L AL + T+ T I +M 

Sbjct: 366 I^TEKHITKVIYRVNWIGAPILAFLACLPHLVALVAKTINHGIPVIQPSTIFGGTSIIIM 425 

Query: 380 IT 381 
+T 

Sbjct: 426 VT 427 

There is also homology to SEQ ID 3988. 

A related GBS gene <SEQ ID 8783> and protein <SEQ ID 8784> were also identified. Analysis of 
protein sequence reveals the following: 

■ Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: 6.32 
GvH: Signal Score (-7.5): -4.07 

Possible site: 59 
>>> Seems to have an uncleavable N-term signal seq 
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INTEGRAL Likelihood = -1.54 Transmembrane 246 - 262 ( 245 - 262) 

INTEGRAL Likelihood = -0.90 Transmembrane 372 - 388 ( 372 - 388) 

INTEGRAL Likelihood = -0.85 Transmembrane 64 - 80 ( 64 - 81) 
PERIPHERAL Likelihood =8.65 28 
modified ALOM score: 2.48 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 .4970 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF02350(316 - 1500 of 1827) 

EGAD] 6621 | 6420 (8 - 426 of 431) preprotein translocase secy subunit {Bacillus sp.} 
SP|P38375|SECY_BACHD PREPROTEIN TRANSLOCASE SECY SUBUNIT. GP | 484251 1 dbj | BAA01191 . 1 1 | D10360 
secretion protein Y {Bacillus sp.} PIR|B44859 JB44859 preprotein translocase secY - Bacillus 
sp. 

%Match =12.1 

%Identity =26.8 %Similarity =55.4 

Matches = 109 Mismatches = 165 Conservative Sub.s = 116 

57 87 117 147 177 207 237 267 

EVWNVVDRCITEGKTIYGIRRARKDNQYISFERTMDDFEYLCDTIKQNR*SRRVMVT*ILKSIFLILKLTKLTI*SYLS* ' 

297 327 357 387 441 471 501 

REQIDREREIPLKLLYIFEKNIILRKILITFSLIIIFLLGRYVPIPGV--LISAYKGQDNNFATLYSTVTGGNLSQVGVF 

II •• II:: I: MH :| : » I : I I « : I I I I = I II I :| 
MFRTISNIFRVGDLRRKVIFTLLMLIVFRIGSFIPVPGTNREVLDFVDQANAFGFL-NTFGGGALGNFSIF 
10 20 30 40 50 60 70 

531 582 594 624 654 681 699 

SLGIGPMMTTMILLRLF- - -TIGKYSSGVSQ KVQQFRQNWMLVIAIIQGLAITISFQ-YHNGF SLTKLL 

::|| I =1 | : : : I : = h= = 1= II s »:|: || | ::: | : |: |:« | 

AMGIMPYITASIWQLLQMDWPKFAEWAKEGEAGRRKLAQFTRYGTIVVLGFIQALGMSVGFNNFFPGLIPNPSVSVYL 
80 90 100 110 120 130 140 150 

729 759 786 816 846 870 888 918 

LATMILVTGAYIISWIGNLNAEYGFG-GMTILWVGMLVGQFNNIPLIFEL-FQD-GYQL AIILFLLWTLVAMYLM 

: ::| | : |:| | | |::|;: |: | | : ||: II I || :||:| ::|: : 

FIALVLTAGTAFLMWLGEQITAKGVGNGISIIIFAGIAAGIPNGIiNLIYSTRIQDAGEQLFLNIVVILLILAIiAILAIIVG 
160 170 180 190 200 210 220 230 

966 1023 1053 1083 1113 1143 

I TFERSE YR - 1 PVM RTSIHNRLTODA-YMPIKVNASGGMAFMYVYTLLMFPQYIIILLRSIFPTNPDITSYNDYFSL 

: I = I III I I : : -MUM = = = =11 = 11 = 1= I I = II 

VIFVQQALRKIPVQYAKRLVGRNPVGGQSTHLPLKVNAAGVIPVIFALSLLIFPPTVAGLFGSDHPVAAWVIETFDY 

240 250 260 270 280 290 300 

1173 1203 1233 1263 1293 1323 1353 1383 

SSICGWIYMILMLVLSVAFTFVNIDPTKISEAMRESGDFIPNYRPGKETQSYLSKICYLFGTFSGFFMAFLGGVPLLFA 



THLIGMAVYALRIIGFTYFYAFIQVNPERMAENLKKQGGYIPGIRPGKATQTYITPILYRLTFVGSLFLAWAILPVFF- 
320 330 340 350 360 370 380 

1413 1440 1470 1500 1530 1560 1590 1620 

LGNDDLRTVSSMTGI-FMMITGMSFMILDEFQVIRIRKQYTSVFENEEN*CFILFHLGIMKIVLGMIIITCGISSRLMSV 
: || : | :=:: |::: : : : |:: | 

IKFADLPQAIQIGGTGLLIWGVALDTMKQIEAQLIKRSYKGFIK 
400 410 420 430 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



WO 02/34771 



PCT/GB01/04789 



-1436- 

Example 1299 

A DNA sequence (GBSxl377) was identified in S.agalactiae <SEQ ID 3989> which encodes the amino 
acid sequence <SEQ ID 3990>. Analysis of this protein sequence reveals the following: 

Possible site: 24 
5 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3002 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



15 



>GP:AAF61315 GB:U96166 unknown " [Streptococcus cristatus] 
Identities = 30/78 (38%) , Positives = 41/78 (52%) 

Query: 276 ALTVTLTDDIWELEHLLQRCPNTDFHIAAPVYCSDRLKQLVGYPNYYLHEA.ITEEQFEVL 335 

AL +T +D + ++E LL + PN FHI A S L L YPN L+ I + L 
Sbjct: 289 ALILTNSDQLEQIEQLLTQLPNVHFHIGAITEMSGHLMGLNRYPNVSLYPNIRPAKVAEL 348 

20 Query: 336 LLNSDIYLDINHGEEVWN 353 

D+YLDIN +E+ N 
Sbjct: 349 FERCDLYLDINISDEILN 366 

No corresponding DNA sequence was identified in S.pyogenes. 

25 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1300 

A DNA sequence (GBSxl378) was identified in S.agalactiae <SEQ ID 3991> which encodes the amino 
acid sequence <SEQ ID 3992>. This protein is predicted to be eps7. Analysis of this protein sequence 
30 reveals the following: 

Possible site: 19 

>>> May be a lipoprotein 

Final Results 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

40 >GP:CAC07458 GB:AX009404 product = eps7 [Streptococcus thermophilus] 

Identities = 87/232 (37%) , Positives = 133/232 (56%) , Gaps = 22/232 (9%) 

VSVI I P VYNAAPYLEGCVNT I LGQTYQVFEILLIDDGSTDTSAS I CDQLSLRDNRI RVFH 69 
+S++IPVYN Y++ C+++IL QT+ EI+L+DDGSTD S ICD S D RI+V H 
IS I VI PVXNVQDYI KKCLDSILSQTFSDIiEI IL VDDGSTDLSGRI CDYYSENDKRIKVIH 62 

IENGGASKARNFGIARISPESQEVrFvDSDDWVKENYLEVLLAQQEKYNADIVISNYYIY 12 9 

NGG S+ARN G+ + S+++TF+DSDD+V +Y+E L + +NADI I+++ 
TANGGQSEARNVGIKNAT- -SEWITFIDSDDYVSSDYIEYLYNLIQVHNADISIASF- - - 117 

RETEDIFGYYITDKDFV IEEISAQTAIDRQVHWHLNSSVFIVIWGKLYRRELFD 183 

YIT K + + + A+TAI R + LN + +WGK+YR E F+ 
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Sbjct: 167 KYKFVSGKLFEDSLITYQIFSEASTIVFGAKDIYFYVNRKNSTVNGTFNIKK 218 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

5 Example 1301 

A DNA sequence (GBSxl379) was identified in S.agalactiae <SEQ ID 3993> which encodes the amino 
acid sequence <SEQ ID 3994>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>» Seems to have no N-terminal signal sequence 

10 

Final Results 

bacterial cytoplasm Certainty=0 . 1569 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Cert ainty=0 . 0000 (Not Clear) < suco 

15 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

20 Example 1302 

A DNA sequence (GBSxl380) was identified in S.agalactiae <SEQ ID 3995> which encodes the amino 
acid sequence <SEQ ID 3996>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

>» Seems to have no N-terminal signal sequence 

25 

Final Results 

bacterial cytoplasm Certainty=0 . 1662 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes, 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

35 Example 1303 

A DNA sequence (GBSxl381) was identified in S.agalactiae <SEQ ID 3997> which encodes the amino 
acid sequence <SEQ ID 3998>. This protein is predicted to be a glycosyl transferase (gspA). Analysis of 
this protein sequence reveals the following: 

Possible site: 13 
40 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2606 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:AAF28363 GB:AF224467 putative glycosyl transferase [Haemophilus 
ducreyi] 

Identities = 62/177 (35%) , Positives = 105/177 (59%) , Gaps = 8/177 (4%) 

5 , Query: 3 YARYYIPQLIDAEKVLYLDIDTLVA7DNLDKLFEIELGDYPIAAILD--GDGIY FN 55 

+ RY+I 1+ +KV+YLD D +V +L +L++ ++ +Y +AA+ D + IY FN 
Sbjct: 89 FFRYFI SDFIEQDKyi YLDADI VWGSLTELYQTDISNYFIAAVKDI ISEKI YVNNHI FN 148 

Query: 56 SGVMLINSLYW^YRVTEKLLEITERELDNGIFGDQGvTjNLLFDNNWLKLEDKYNAQVGN 115 
10 +G++LIN+ W + +T+ L ++E+ +++ DQ +IOT1+F + WLKL YN +G 

Sbjct: 149 AGMLLINNKKWREHNITQFCLSI1SEKYINSLPDADQSILNLIFKDKWLKI1NRGYNYLIGT 208 

Query: 116 DLGAFYENWQGYFDRNFES - PTI IHYCTHDKPWNTFSSSRFRETWWQYEQLDWNEVF 171 
D F Y + E+ P IIHY T KPW ++RFR +W Y +L+W +++ 

15 Sbjct: 209 DYLFFKYGKTRYLEDLGETIPLIIHYNTFAKPWIiNIFNTRFRNIYWFYYEIjNWQDIY 265 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

20 Example 1304 

A DNA sequence (GBSxl384) was identified in S.agalactiae <SEQ ID 3999> which encodes the amino 
acid sequence <SEQ ID 4000>. This protein is predicted to be a glycosyl transferase. Analysis of this 
protein sequence reveals the following: 

Possible site: 56 
25 >>> Seems to have no N-terminal signal sequence 

--, — Final Results 

bacterial cytoplasm Certainty=0. 1157 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF28363 GB:AF224467 putative glycosyl transferase [Haemophilus 
ducreyi] 

35 Identities = 103/259 (39%) , Positives = 156/259 (59%) , Gaps = 3/259 (1%) 



40 



45 



55 
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IALAADFGYQEQVKTIIKSICFHNQFIDFYILNDDFPVEWFQMMEYHLSKMDCTISNTKI 66 

I LAA+ Y E + T IKSI HN+ I FY+LN D+P EWF ++ L K++ I + K+ 

I VLAANQSYSEYILTTIKSIYLHNKHIRFYLIiNRDYPTEWFDIIJSn^KLRKLNSEIIDIKV' 69 

FNEEIKHFK-FQKPMPYPTYFRYFIPEVIHEDKVLYLDCDMIITSDLTSIFTLDISKYGV 125 

N+ IK+FK + T+FRYFI + I +DKV+YLD D+++ LT ++ DIS Y + 

TNDTIKNFKTYSHISSDTTFFRYFISDFIEQDKVIYLDADIWNGSLTELYQTDISNYFL 129 



AAV+D + E+ FN+G+LLINN VJRE I+Q L + + +L DQ +LN 



50 + D WL+L+ YNY G D L+ + ++ + + +P +IHY T KPW + + R 



+R+I+W Y L W+DI+ + 



No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1305 

A DNA sequence (GBSxl385) was identified in S.agalactiae <SEQ ID 4001> which encodes the amino 
5 acid sequence <SEQ ID 4002>. This protein is predicted to be a glycosyl transferase. Analysis of this 
protein sequence reveals the following: 

Possible site: 28 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 2679 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Mot Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF28363 GB:AF224467 putative glycosyl transferase [Haemophilus 
ducreyi] 

Identities = 94/263 (35%) , Positives = 158/263 (59%) , Gaps = 4/263 (1%) 

20 Que'ry: 2 KKTIVLGADFQYRDQVMTTIKSIVSHNQHLTIYIINTDFPVEWFNILNHSLEQFDCRVKN 61 

K IVL A+ Y 4- ++TTIKSI HN+H+ Y++N D+P EWF+ILN+ L +■ + + + 
KMNIVIAANQSYSEYILTTIKSIYLHNKHIRFYLI^DYPTEWFDILNNKIjRKMISEIID 66 

IPISSDVFEGIPTIjSHISV-AGFFRWFIPIHLEEEIVLYLDSDVIVRGSLDPLFDINLEE 120 
25 I +++D + T SHIS FFR+FI +E++ V+YIiD+D++V GSL L+ ++ 



Que'ry: 
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Query: 
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Sbjct: 
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Query: 


180 


Sb j ct : 
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Query: 


240 


Sb j ct : 


245 



h AV D S Y + FN+G++LINN W++ I + +++K +++ DQ 



LN++ +++W+ + + YN IG D YG+ + P+I+HYN++ KPW 



30 



35 

Query: 240 S( 

+R+R+ +W+Y+ L W 1YA+ 
SPTRFRNIYWFYYELNWQDIYAK 

40 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1306 

A DNA sequence (GBSxl386) was identified in S.agalactiae <SEQ ID 4003> which encodes the amino 
45 acid sequence <SEQ ID 4004>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>» Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0 .2996 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 10003> which encodes amino acid sequence <SEQ ID 
10004> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC75095 GB:AE000294 putative Galf transferase [Escherichia coli K12] 
Identities = 68/286 (23%) , Positives = 122/286 (41%) , Gaps = 18/286 (6%) 



Query: 


77 


STRMDGI IAGLGRGDI WFQVPTWNSTEFDELFliDKLQAYGARI ITFVHDIVPLMFESNF 


136 






S ++ + GL D+++F P F +L + RI+ +HDI L 




Sbjct: 


50 


SVKLSTFLCGLENKDVLIFNFPMAKPFWHILSFFHRLLKF- -RIVPLIHDIDELRGGGGS 


107 


Query: 


137 


YLLDRVIDWnjRSDWILPTKAMHDYLIEKGMTTSKVLYQEVWDHPVNIDLPRPEC- - -Q 


193 






D V D+VI M YL K M+ K+ +++D+ V+ D+ + Q 




Sbjct: 


108 


- - -DSV- - RLATCDMVI SHNPQMTKYL- SKYMSQDKIKD I KI FDYLVSSDVEHRD vTDKQ 


161 


Query: 


194 


KVLSFAGDIQRFPFVNDWKENIPLIYYGDGSRLNSEANVHAQGWKDDVELMLSLSKRG-G 


252 






+ + +AG++ R + E +G ++ N G D + ++ G 




Sbjct: 


162 


RGVIYAGNLSRHKCSFIYTEGCDFTLFG- -VNYENKDNPKYLG-SFDAQSPEKINLPGMQ 


218 


Query: 


253 


FGLCWSEDREELVERR- - -YSRMNASYKLSTFIAAGLPI IANHDISSRDFIKQHGLGFTV 


309 






FGL WDE Y + N+KS +L+ LP+ + DFI + +G+ V 




Sbjct: 


219 


FGLIWDGDSvETCSGAFGDYLKFimPHKTSLYLSMELPVFITOKAALADFIVDNRIGYAV 


278 


Query: 


310 


ETLEEAVEKINNMEKETYDSYVENVEKIATLLRNGYITKKLLIDAV 355 








+++E E +++M ETY EN 4- 1+ +R G ++!• + + 




Sbjct: 


279 


GSIKEMQEIVDSMTIETYKQISENTKIISQKIRTGSYFRDVLEEVI 324 





No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1307 

A DNA sequence (GBSxl387) was identified in S.agalactiae <SEQ ID 4005> which encodes the amino 
acid sequence <SEQ ID 4006>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

»> Seems to have no N-terrainal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3 098 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA73093 GB:M76233 [Rabbit smooth muscle myosin light chain 

kinase mRNA, complete CDS . ] , gene product [Oryctolagus 
cuniculus] 

Identities = 23/63 (36%) , Positives = 36/63 (56%) 

Query: 5 QPAPALQRVRQCQPAPVLQPVPRCQPALALQRVRQCQPAQVLQQVPRCQPAQVLQQVPRC 64 

+PA L+ V +PA L+PV +PA L+ V +PA+ L+ V +PA+ L+ V 
Sbjct: 225 KPAETLKPVGNAKPAETLKPVGNAKPAETLKPVGNAKPAETLKPVGNAKPAETLKAVANA 284 

Query: 65 QPA 67 
+PA 

Sbjct: 285 KPA 287 

No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1308 

A DNA sequence (GBSxl388) was identified in S.agalactiae <SEQ ID 4007> which encodes the amino 
acid sequence <SEQ ID 4008>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>>> Seems to have no N- terminal signal sequence 
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INTEGRAL 


Likelihood 
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Transmembrane 
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245) 


INTEGRAL 


Likelihood 




-7. 
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Transmembrane 
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( 
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INTEGRAL 


Likelihood 




-7. 


96 


Transmembrane 
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( 
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185) 


INTEGRAL 


Likelihood 




-7. 


96 


Transmembrane 
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- 151 


( 
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185) 


INTEGRAL 


Likelihood 




-7. 
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Transmembrane 
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- 171 


( 
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185) 


INTEGRAL 


Likelihood 




-6. 


.85 


Transmembrane 


15 


- 31 


( 


8 




45) 


INTEGRAL 


Likelihood 




-4. 


.09 


Transmembrane 


39 


- 55 


( 


35 




57) 


INTEGRAL 


Likelihood 




-4. 


.09 


Transmembrane 


63 


- 79 


( 


59 




81) 


INTEGRAL 


Likelihood 




-2 


.71 


Transmembrane 


235 


- 251 


( 


235 




251) 


INTEGRAL 


Likelihood 




-0 


,11 


Transmembrane 


253 


- 269 


( 


253 




269) 



Final Results 

bacterial membrane Certainty=0 .4694 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < succ> 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC16164 GB:AF010496 ice nucleation protein [Rhodobacter apsulatus] 
Identities = 85/286 (29%) , Positives = 119/286 (40%) , Gaps = 17/286 (5%) 

Query: 3 ALVLftDVJDALVETLVIiADWALIEALVLADIEALV EALVLADIEALVEALVLADID 58 

ALA ALT+ A++L AD+ L +AL A I AL + + A 

Sbjct: 523 ALSDAQAGALTSTQIGLLSTAAVKGLSTADMAGLTTAEAQALTSAQIAALSSSQIRAMTT 582 

Query: 59 ALVEALVLADIEALVEALVL ADIDALVEALVLADVEALIEALVLALVEALVLADVE 114 

A + AL A 1+ L + +L ADI AL A + I AL +LV A+ AD+ 

Sbjct: 583 AQIAALGTAQIKGLTASNILGLETADIVALTTTQAPALSSSQIAALSTSLVAAMETADLA 642 

Query: 115 ALIEALVLAL VEALVLADVEAL IEALVLALVEALVLADVEALIEALVLALVE 166 

LA +ALAA+ I+A++L AD+ AL A + + 

Sbjct: 643 KLSAATFKGFSSTQITALTTAQAGAIGTDQIAQITTAAIKGLESADIAALANATLAKMTT 702 

Query: 167 ALVIJUWFJUJIEALVIiADvD-ALvT^V^^ 225 

AV A+L ++LA V+AL A + L ++ AL AL V 

Sbjct: 703 AQVAVLGSAQLTGLTTTQINTVLTTAQVKALGAAALAGLGTDDIVALTTGQAAALSSTQV 762 

Query: 226 EALIIALVEALVIiADVDALMEALVLADVEALMEALVIADVD 271 

AL A + AL AD AL A + +AL +DAL A 

Sbjct: 763 AALSTAQISALQTADFAALSTAAIKGLSSTQITALSTGQIDALTTA 808 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1309 

A DNA sequence (GBSxl389) was identified in S.agalactiae <SEQ ID 4009> which encodes the amino 
acid sequence <SEQ ID 4010>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 . 2297 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1310 

A DNA sequence (GBSxl390) was identified in S.agalactiae <SEQ ID 401 1> which encodes the amino 
acid sequence <SEQ ID 4012>. This protein is predicted to be fimbriae-associated protein Fapl. Analysis of 
this protein sequence reveals the following: 

Possible site: 50 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3138 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA97453 GB:AB029393 streptococcal hemagglutinin [Streptococcus gordonii] 
Identities = 388/968 (40%) , Positives = 518/968 (53%) , Gaps = 68/968 (7%) 



Query: 


13 


VDTKSRVKMHKSEKNWTOTVMSHFNLFKAIKGRATVEADVCIQDVEKEDRLSSGNLTYLK 


72 






V+ +R K+ KS K+W+R S F L + +KG +V V +E + G L YLK 




Sb j ct : 


13 


VERVTRFKLIKSGKHWLRAATSQFGLLRLMKGADISSVEV KVAEEQSVEKGGLNYLK 


69 


Query: 


73 


GILAAGALVGGASLTSR-VYADETPWQEQSSSVPTLAEQTEVTV- -KTTTVQNHQDGTV 


129 






GI+A GA++GGA +TS VYA+E +++ + LA + E + + T+ + 




Sbjct: 


70 


GI1ATGAVLGGAVVTSSSWAEEEQALEKVIDTRDVLATRGEAVLSEEAATTLSSEGANP 


129 


Query: 


130 


SKNIIDSNSVSMSESASTSTSESVSMSMSGSTLTSVSESVSTSALTSASESISTSASESV 


189 






+++ D+ S S S SA+ S S S+S+S S S S S S S+S S+SES S S S SV 




Sb j ct : 


130 


VESLSDTLSASESASAN-SVSTSISISESFSVSASASLSSSSSLSQSSSESASASESLSV 


188 


Query: 


190 


SKSTSISEVSNILETQASLTDKGRESFSANQIVTESSLVTDAGKNASVSSLIEITKPKSE 


249 






S STS S S TQ+S + S S+N + T S V+ +NA V + + +E 




Sb j ct : 


189 


SASTSQSFSSTTSSTQSSNNESLISSDSSNSLNTNQS-VSARNQNARVRTRRAVAANDTE 


247 


Query: 


250 


LQTSKMSNESLITPEKSQVMIASDKTGNESLTPTIRLKSVIQPRSMNLMTLSSEMDLIPL 


309 






K + + E + ++ T N + ++ N+ ++ LP 




Sb j ct : 


248 


APQVKSGDYWYRGESFEYY- -AEITDNSGQVNRWTR NVEGGANSTYLSPN 


297 


Query: 


310 


EEVSDTEMLGKDVSSELQKVNIALKDNTLSEPGTVKLDSSENLVLNFAFS1ASVNEGDVF 


369 






TE LG+ ++ +Q L+ E ++ + ++ + +A G+ 




Sbjct: 


298 


WVKYSTENLGRPGNATVQN - - - PLRTRIFGEVPLNEIVNEKSYYTRYI - -VAWDPSGN- - 


350 


Query: 


370 


TVKLSDNLDTQGIGTILKVQDIMDETGQLLATGSYSPLTHNITY TWTRYAST 


421 






++ DN + G+ + +E Y P ++TY T R A 




Sb j ct : 


351 


ATQMVDNANRNGLERFVLTVKSQNE KYDPAESSVTYVNNLSNLSTSEREAVA 


402 


Query: 


422 


LNNIKARVNMPVWPDQRI ISKTTSDKQCFTATLNNQVASIE ERVQYNSPS 


471 






A N+P P +1 ++ T DK T N V ++ S S 




Sb j ct : 


403 


AAVRAANPNIP--PTAKITVSQNGTVTITYPDKSTDTIPANRVVKDLQISKSNSASQSSS 


460 


Query: 


472 


VTEHTNVKTNVRSRIMKLDDERQTETYITQINPEGKEMYFASGLGNLYTIIGSDGTSGSP 


531 
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V+ + T+V +1 ++ + + ++ S+ S S 



OJJJ L-O . 




VRAROSAP-TflVCJAST- - -flARMflA,WflVST^ASTSASVSASESASTSASV , SASESASTS- 


516 


Query: 


532 


VNLLNAEVKILKTNSKNLTDSMDQNYDSPEFEDVTSQYSYTNDGSKITIDWKTNSISSTT 


591 






& V K++S + + S+++ + S+S + S+S++T 




Sb j Ct : 


517 


ASVSASKSSSTSASVSASESASTSASVSASESASTSASVSASESASTSASVSAST 


571 


Query: 


592 


SYVVLVKIPKQSGVLYSTVSDlNQTYGSKYSyGHTNISGDSnANAEIKL-LSESASTSAS 


650 






S + ST + ++ + + S ++S A+ + SESASTSAS 




Sb j ct : 


572 


SASTSASVSASESA- -STSASVSASESASTS ASVSASESASTSASVSASESASTSAS 


626 


Query: 


651 


TSASTSASMSASTSASTSASMSASTSASTSASTSASMSASTSASTSASTSASTSASTSAS 


710 






CAS S+S SAS SAS SAS SAS SAS SASTSAS+SAST2ASTSAS SASTSASTSAS 




qv-.-vr.t- • 


627 


V^A<TR^^T^APA73APJ^AST^AOT^ 


686 


Query: 


711 


MSASTSASTSASTSASTSASTSASTSASMSASTSASTSASTSASTSASMSASTSASTSAS 


770 






+ C !ASTSASTSAS SAS SASTSAS SAS SASTSAS SASTSASTSAS+SASTSASTSAS 




OJJ J(-U . 


DO / 


VQAQTQ A <3TQA<3V^A^1?G.7A G/TG a 

V OflO X Drlu J. Onu v DfiOdDii.0 X OjriO V OrlOiJOriuJ X Ori.0 V OrlD X Ori.O x OfiO v OflLj l wftu x wnu 


746 


Query: 


771 


TSASTSASMSASTSASTSASTSASTSASMSASTSASTSASTSASTSASMSASTSASTSAS 


830 






SAS SAS SAS SASTSASTSAS SAS SASTSAS SAST ASTSAS+SAS SASTSAS 




oDj Ct . 


/*fc / 


VGaO.TPC AC'TC.AC.VC.AC.TG.APTC.aGV^AG.TrcziG.T'q AQ^70.AC.TVA^TQA^V^A^'R C .A C IT£;A 0 , 
VOxiDCiDHO X DfiOVijilQ X OiiO 1 OnD V onD Cii3nO 1 u/iD V OftO ± I.H.O X OriO V 0-rtOEiO.rt.O X OjMO 


806 


Query: 


831 


TSASMSASTSASTSASMSASTSASTSASMSASTSASTSASMSASTSASTSASMSASTSAS 


890 






SAS SASTSAS SAS SASTSAS SAS SASTSAS SAS SASTSAS SAS SASTSAS 




■ . 

Sb] ct : 


807 


T7C7\ CTT G 7k GT^C 7A G^7G S CTC fl OTCS GT7GfiG'K , G7A CPGaCVGIi CTTQ QTC A G^G A QTTGA CTQAQ 
VoiiOCiOfiO 1 DflbVDiiD 1 OfiO X iDnO Vo/iOCiijAO i DiiD V bfiOOD^iD 1 0/\0 V 0-H.OJlOriO X Carlo 


ODD 


Query: 


891 


MSATTSASTSVSTSASTSASTSASTSSSSSVTSNSSKEKVYSALPSTGDQDYSVTATALG 


950 






+SA+TSASTS S SAS SASTSAS S+S S ++++S SA S +T+ 




Sbjct: 


867 


VSASTSASTSASVSASESASTSASVSASESASTSASVSASESASTSASVSASESASTSAS 


926 


Query: 


951 


LGLMTGAT 958 








+ T A+ 




Sb j ct : 


927 


VSASTSAS 934 





There is also homology to SEQ ID 760. 

SEQ ID 4012 (GBS68) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysisof total cell 
extract is shown in Figure 33 (lane 4; MW 1 3 1 .2kDa). 

GBS68d was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 153 (lane 14; MW 103kDa) and in Figure 239 (lane 13; MW 103kDa). It was also 
expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 152 
(lane 17; MW 78kDa), in Figure 153 (lane 17; MW >78kDa) and in Figure 184 (lane 10; MW 78kDa). 
Purified GBS68d-GST is shown in Figure 246, lane 5. 

Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 

Example 1311 

A DNA sequence (GBSxl391) was identified in S.agalactiae <SEQ ID 4013> which encodes the amino 
acid sequence <SEQ ID 4014>. This protein is predicted to be RofA. Analysis of this protein sequence 
reveals the following: 

Possible site: 30 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 1738 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000(Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 10005> which encodes amino acid sequence <SEQ ID 
10006> was also identified. 

There is also homology to SEQ ID 3750. 

5 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1312 

A DNA sequence (GBSxl392) was identified in S.agalactiae <SEQ ID 4015> which encodes the amino 
acid sequence <SEQ ID 4016>. This protein is predicted to be Nra. Analysis of this protein sequence 
10 reveals the following: 

Possible site: 16 

»> Seems to have a cleavable N-term signal seg. 

Final Results 

15 bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

There is also homology to SEQ ID 3750. 

20 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1313 

A DNA sequence (GBSxl393) was identified in S.agalactiae <SEQ ID 4017> which encodes the amino 
acid sequence <SEQ ID 4018>. Analysis of this protein sequence reveals the following: 

25 Possible site: 19 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3674 (Affirmative) < suco 

30 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA27020 GB:M80215 uvs402 protein [Streptococcus pneumoniae] 
35 Identities = 577/663 (87%) , Positives = 633/663 (95%) , Gaps = 1/663 (0%) 



Query: 


1 


MIDRKDTNRFKLVSKYSPSGDQPQAIETLVDNIEGGEKAQILKGATGTGKTYTMSQVIAQ 


60 






MI+ N+FKLVSKY PSGDQPQAIE LVDNIEGGEKAQIL GATGTGKTYTMSQVI++ 




Sb j ct : 


7 


MINHITDNQFKLVSKYQPSGDQPQAIEQLVDNIEGGEKAQILMGATGTGKTYTMSQVISK 


66 


Query: 


61 


VNKPTLVIAHNKTLAGQLYGEFKEFFPDNAVEYFVSYYDYYQPEAYVPSSDTYIEKDSSV 


120 






WKPTLVIAHNKTLAGQLYGEFKEFFP+NAVEYFVSYYDYYQPEAYVPSSDTYIEKDSSV 




Sb j ct : 


67 


VNKPTLVIAHNKTLAGQLYGEFKEFFPENAVEYFVSYYDYYQPEAYVPSSDTYIEKDSSV 


126 


Query: 


121 


NDEIDKLRHSATSSLLERNDVIVVASVSCIYGLGSPKEYADSWSLRPGQEISRDQLIiNN 


180 






NDEIDKLRHSATS+LLERNDVIWASVSCIYGLGSPKEYADSWSLRPG EISRD+LI1N+ 




Sb j ct : 


127 


NDEIDKLRHSATSALLERNDVIWASVSCIYGLGSPKEYADSWSLRPGLEISRDKLLND 


186 


Query: 


181 


LVDIQFERNDIDFQRGKFRVRGDVVEVFPASRDEHAFRIEFFGDEIDRIREIESLTGRVL 


240 






LVDIQFERNDIDFQRG+FRVRGDWE+FPASRDEHAFR+EFFGDEIDRIRE+E+LTG+VL 




Sb j ct : 


187 


LVDIQFERNDIDFQRGRFRVRGDVVEIFPASRDEHAFRVEFFGDEIDRIREVEALTGQvL 


246 
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Query: 241 GEVEHIAIFPATHFMTNDEHMEEA.ISKIQAEMENQVELFEKEGKLIEAQRIRQRTEYDIE 300 

GEV+HIAIFPATHF+TOD+HME AI+KIQAE+E Q+ +FEKEGKL+EAQR++QRTEYDIE 
Sbjct: 247 GEVDHLAIFPATHFVTNDDHMEVAIAKIQAELEEQLAVFEKEGKLLEAQRLKQRTEYDIE 306 

5 Query: 301 MLREMGYTNGVENYSRHMDGRSEGEPPFTLLDFFPEDFLIMIDESHMTMGQIKGMYNGDR 360 

MLREMGYTNGVENYSRHMDGRSEGEPP+TLLDFFP+DFLIMIDESHMTMGQIKGMYNGDR 
Sbjct: 307 MLREMGYTNGVENYSRHMDGRSEGEPPYTLLDFFPDDFLIMIDESHMTMGQIKGMYNGDR 366 

Query: 361 SRKEMLVNYGFRLPSALDNRPLRREEFESHVHQIVYVSATPGDYEMEQTDTVVEQIIRPT 420 
10 SRK+MLVNYGFRLPSALDNRPLRREEFESHVHQIVYVSATPGDYE EQT+TV+EQI IRPT 

Sbjct: 367 SRKKMLVNYGFRLPSALDNRPLRREEFESHVHQIVYVSATPGDYENEQTETVIEQIIRPT 426 

Query: 421 GLLDPEVEVRPSMGQMDDLLGEINLRTEKGERTFITTLTKRMAEDLTDYLKEMGVKVKYM 480 
GLLDPEVEVRP+MGQ+DDLLGEIN R EK ERTFITTLTK+MAEDLTDY KEMG+KVKYM 
15 Sbjct: 427 GLLDPEVEWPTMGQIDDLLGEINARVEKIffiRTFITTLTKKMAEDLTDYFKEMGIKVKYM 486 

Query: 481 HSDIKTLERTEIIRDLRLGVFDVLIGINLLREGIDVPEVSLVAILDADKEGFLRNERGLI 540 

HSDIKTLERTEIIRDLRLGVFDVL+GINLLREGIDVPEVSLVAILDADKEGFLRNERGLI 
Sbjct: 487 HSDIKTLERTEIIRDLRLGVFDVLVGINLLREGIDVPEVSLVAILDADKEGFLRNERGLI 546 

20 

Query: 541 QTIGRAARNSNGHVIMYADKITDSMQRAMDETARRRRLQMDYNEKHGIVPQTIKKEIRDL 600 

QTIGRAARNS GHVIMYAD +T SMQRA+DETARRR++QM YNE+HGIVPQTIKKEIRDL 
Sbjct: 547 QTIGRAARNSEGHVIMYADTVTQSMQRA.IDETARRRKIQMAYNEEHGIVPQTIKKEIRDL 606 

25 Query: 601 IAITKSNDSDKPEKWDYSSLSKKERQAEIKALQQQMQEAAELLDFELAAQIRDVTLELK 660 

IA+TK+ ++ +K VD +SL+K+ER+ +K L++QMQEA E+LDFELAAQIRD++LE+K 
Sbjct: 607 IAVTKAVAKEE-DKEVDINSMKQERKELVKia^EKQMQEAVEVLDFElAAQIRDMMLEVK 665 

Query: 661 AID 663 
30 A+D 

Sbjct: 666 ALD 668 

A related DNA sequence was identified in S. pyogenes <SEQ ID 4019> which encodes the amino acid 

sequence <SEQ ID 4020>. Analysis of this protein sequence reveals the following: 

35 Possible site: 55 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4386 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 570/663 (85%) , Positives = 625/663 (93%) 

45 

Query: 1 MIDRKDTNRFKLVSKYSPSGDQPQAIETLVDNIEGGEKAQILKGATGTGKTYTMSQVIAQ 60 

MID++D FKL SKY PSGDQPQAIE+LVDNIEGGEKAQIL GATGTGKTYTMSQVI++ 
Sbjct: 1 MIDKRDDKPFKLKSKYKPSGDQPQAIESLVDNIEGGEKAQILLGATGTGKTYTMSQVISK 60 

50 Query: 61 VNKPTLVIAHNKTIAGQLYGEFKEFFPDNAVEYFVSYYDYYQPEAYVPSSDTYIEKDSSV 120 

VNKPTLVIAHNKTLAGQLYGEFKEFFPDNAVEYFVSYYDYYQPEAYVPSSDTYIEKDSSV 
Sbjct: 61 VNKPTLVIAHNKTIAGQLYGEFKEFFPDNAVEYFVSYYDYYQPEAYVPSSDTYIEKDSSV 120 

Query: 121 NDEIDKLRHSATSSLLERNDVIWASVSCIYGLGSPKEYADSWSLRPGQEISRDQLLNN 180 
55 NDEIDKLRHSATSSLLERNDVIWASVSCIYGLGSPKEYADS VSLRPGQEISRD LEN 

Sbjct: 121 NDEIDKLRHSATSSLLERNDVIWASVSCIYGLGSPKEYADSAVSLRPGQEISRDTLLNQ 180 

Query: 181 LVDIQFERNDIDFQRGKFRTOGDVVEvPPASRDEHAFRIEFFGDEIDRIREIESLTGRVL 240 
LVDIQFERNDIDFQRG FRVRGDWEVFPASRDEHAFR+EFFGDEIDRI EIESLTG+ + 
60 Sbjct: 181 LVDIQFERNDIDFQRGCFRVRGDWEVFPASRDEHAFRVEFFGDEIDRICEIESLTGKTI 240 

Query: 241 GEVEHLAIFPATHFMTNDEHMEEAISKIQAEMENQVELFEKEGKLIEAQRIRQRTEYDIE 300 

GEV+HL +FPATHF+TNDEHME++I+KIQAE+ Q++LFE EGKL+EAQR+RQRTEYDIE 
Sbjct: 241 GEVDHLVLFPATHFVTNDEHMEQSIAKIQAELAEQLQLFESEGKLLEAQRLRQRTEYDIE 300 

65 
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MLREMGYT+GVENYSRHMDGRS GEPP+TLLDFFPEDFLIMIDESHMTMGQIKGMYNGD+ 



+RK+MLV+YGFRLPSALDNRPLRREEFESHVHQIVYVSATPG+YEM QT+T++EQI IRPT 



GLLDPE++VR SMGQMDDLLGE IN R + ERTFITTLTK+MAEDLTDYLKEMGVKVKYM 



HSDIKTLERTEIIRDLRLGVFDVLIGINLLREGIDVPEVSLVAILDADKEGFLRNERGLI 



QTIGRAARN +GHVIMYADK+TDSMQRA+DETARRR +Q+ YN+ HGIVPQTIKK+IR L 



I+I+K++ +D ++ +DY S+S+ ER+ I ALQ+QMQEAAELLDFELAAQ+RD+ILELK 



+D 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1314 

A DNA sequence (GBSxl394) was identified in S.agalactiae <SEQ ID 402 1> which encodes the amino 
acid sequence <SEQ ID 4022>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have no N-terminal signal sequence 



Query: 


301 


Sbjct: 


301 


Query: 


361 


Sb j ct : 


361 


Query: 


421 


Sbjct: 


421 


Query: 


481 


Sb j ct : 


481 


Query: 


541 


Sbjct: 


541 


yuery . 


DUX 


Sbjct: 


601 


Query: 


661 


Sb j ct : 


661 



INTEGRAL 


Likelihood 




•11. 


.78 


Transmembrane 


284 


- 300 


( 


274 - 


303) 


INTEGRAL 


Likelihood 




•10. 


.08 


Transmembrane 


20 


- 36 


( 


16 - 


53) 


INTEGRAL 


Likelihood 




-5. 


.52 


Transmembrane 


117 


- 133 


( 


114 - 


137) 


INTEGRAL 


Likelihood 




-5. 


,15 


Transmembrane 


203 


- 219 


( 


201 - 


225) 


INTEGRAL 


Likelihood 




-3 


.29 


Transmembrane 


183 


- 199 


( 


182 - 


200) 


INTEGRAL 


Likelihood 




-1 


.54 


Transmembrane 


74 


- 90 


( 


73 - 


90) 


INTEGRAL 


Likelihood 




-0 


.48 


Transmembrane 


37 


- 53 


( 


37 - 


53) 



Final Results 

bacterial membrane Certainty=0 . 5713 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA22372 GB:AL034446 putative transmembrane protein 
[Streptomyces coelicolor A3 (2) ] 
Identities = 58/190 (30%), Positives = 96/190 (50%), Gaps = 11/190 (5%) 

Query: 114 GWS--IGFILFSISVITAYILGGLDFHSYDVSK-ATIFYWTLLPFWLIQSGTEELLTRG 170 

GW IGF LF +VIT G Y+V ++ + L+ F + TEE++ RG 

Sbjct: 98 GWGTLIGFGLFG-AVITNLFASGY YEVDGLGSVQGAIGLVGFMARAAATEEWFRG 152 

Query: 171 WLLPLINHRFHLAVAIGVSSTLFGILHLVNAHVTFLSIVSI-ICSGVLMSLYMIKSGNIW 229 

L +1 +A+G++ +FG++HL+N T ++I I +G +++ + N+W 

Sbjct: 153 VLFRIIEEHIGTYlALGLTGLVFGLMHIJjNEDATLWGAI^IAIFAGFMLAAAYAATRNLW 212 



Query: 230 SvAALHGATOFSQGNLYGIAVSGQKAGASLLHFTVKENAPDWISGGAFGIEGSLISIFVL 289 
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+H WNF+ G ++ VSG LL T+ + P ++GG FG EGS+ S+ 

Sbjct: 213 LTIGVHFGWNFAAGGVFSTWSGNGDSEGLLDATM--SGPKLLTGGDFGPEGSVYSVGFG 270 

Query: 290 LAAIIYLLWL 299 

+ + LWL 
Sbjct: 271 VLLTLVFLWL 280 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1315 

A DNA sequence (GBSxl395) was identified in S.agalactiae <SEQ ID 4023> which encodes the amino 
acid sequence <SEQ ID 4024>. This protein is predicted to be glutamine-binding periplasmic 
protein/glutamine transport system perme. Analysis of this protein sequence reveals the following: 

Possible site: 20 

>>> Seems 'to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -8.97 Transmembrane S32 - 548 ( 523 - 553) 

INTEGRAL Likelihood = -7.38 Transmembrane 700 - 716 ( 696 - 720) 

INTEGRAL Likelihood = -4.57 Transmembrane 562 - 578 ( 558 - 588) 

INTEGRAL Likelihood = -0.32 Transmembrane 665 - 681 ( 665 - 681) 



Final Results 

bacterial membrane Certainty=0 . 4588 (Affirmative) < suco 

bacterial outside — Certainty=0.0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF16724 GB.-AF141644 putative integral membrane protein 
[Lactococcus lactis] 
Identities = 109/195 (55%) , Positives = 156/195 (79%) , Gaps = 4/195 (2%) 



Query: 


466 


KMFNNGLASLKKSGEYDKLVKKYLSTASTSSNDKAAKPVDESTILGLISNNYKQLLSGIG 


525 






+MFNNGLA+L+ +GEYDK++ KYL++ T + +AK E+T G++ NN++Q+ G+ 




Sb j ct : 


1 


EMFNNGLANLRANGEYDKI IDKYLAS -DTKTIQSSAK- - -ENTFFGILQNNWEQIGRGLL 


56 


Query: 


526 


TTLSLTLISFAIAMVIGIIFGMMSVSPSNTLRTISMIFVDIVRGIPLMIVAAFIFWGIPN 585 






TL L ++SF +AM++GIIFG+ SV+PS LRTI+ I+VD+ R IPL+++ FIF+GIPN 




Sb j ct : 


57 


VTLELAVLSFILAMIVGIIFGLFSVAPSKILRTIARIYVDLNRSIPLLVLTIFIFYGIPN 


116 


Query: 


586 


LIESITGHQSPINDFVAATIALSLNGGAYIAEIVRGGIEAVPSGQMEASRSLGISYGKTM 


645 






L++ ITGHQSP+N+F A IAL+LN AYIAEIVR G++AVPSGQMEASRSLG++Y +M 




Sb j ct : 


117 


LLQIITGHQSPLNEFTAGVIALTLNSSAYIAEIVRSGVQAVPSGQMEASRSLGVTYLTSM 176 


Query: 


646 


QKVILPQAVRLMLPN 660 








+KVTLPQA+++ +P+ 




Sb j ct : 


177 


RKVILPQAIKITIPS 191 





There is also homology to SEQ ID 1 198. 

A further related DNA sequence was identified in S.pyogenes <SEQ ID 9071> which encodes amino acid 
sequence <SEQ ID 9072>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

»> May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS sequences follows: 

Score = 80.8 bits (196), Expect = 2e-17 

Identities = 64/233 (27%) , Positives = 113/233 (48%) , Gaps = 13/233 (5%) 

Query: 34 IKKTRKLWAVSPDYAPFEFKALVNGKDTIVGADVQLAQAIADELDVDLELSPMSFDNVL 93 

+K + K+V S +APFE++ NGK G D++L + IA + L++S FD L 
Sbjct: 268 VKPSYKI VSDSS - - FAPFEYQ NGKGKYTGFDMELIKKIAKQQGFKLDI SNPGFDAAL 322 

Query: 94 SSLQTGKADLAI SGISHTKERAKVYDFS I PYYQAENAI VMRASDAKVTKNI SDLNGKKVA 153 

+++Q+G+AD I+G + T+ R K++DFS PYY +++++ K+ DL GK V 

Sbjct: 323 NAVQSGQADGVIAGATITEARQKIFDFSDPYY--TSSVILAVKKGSNVKSYQDLKGKTVG 380 

15 Query: 154 AQKGSIEEGLVKIQLPKANLISLTAMGEA INELKAGQVYAVTLEAPVAAGFLAQHKD 210 

A+G+ + KN +AEA + + +G + A+ + VA+Q + 

Sbjct: 381 AKNGTASYTWLSDHADKYN-YHVKAFDFAST^DSMNSGSIDALMDDEAVLAYAINQGRK 439 

Query: 211 LALAPFSLKTSDGDAKAVALPKNSGDLTKAVKKVIAKLDEQERYKSFIAETIA 263 
20 P + S GD + +L K N +A L + Y + + ++ 

Sbjct: 440 FE-TPIKGEKS-GDIGFAVKKGANPELIKMFNNGLASLKKSGEYDKLVKKYLS 490 
Score = 74.5 bits (180), Expect = le-15 

Identities = 59/215 (27%) , Positives = 102/215 (47%) , Gaps = 12/215 (5%) 

25 Query: 48 YAPFEFKALVNGKDT I VGADVQLAQAI ADELDVDLELS PMS FDNVLSSLQTGKADLAI SG 107 
YAPFEFK + T G DV + +A ++ ++ FD ++++Q+G+AD ++G 

Sbjct: 36 YAPFEFK DSDQTYKGIDVDIVNEVAKRAGWNVNMTYPGFDAAVNAVQSGQADALMAG 92 

Query: 108 ISHTKERAKVTOFSIPYYQAENAIvMRASDAKVTKNISDLNGKKVAAQKGSIEEGLVKIQ 167 
30 + T+ R KV++FS YY +1+ ++ KVT N L GK V + G+ + ++ 

Sbjct: 93 TTVTEARKKVFNFSDTYYDT-SVILYTKNNNKVT-N^ 

Query: 168 LPKANLISLTAMGEAI - -IffiLKAGQVYAVTLEAPVAAGFLAQHKDLALAPFSLKTSDGDA 225 
K T + N L +G +YA + PV + Q K A+ +++ + 
35 Sbjct: 151 KSKYGYKVKTFDTSDLMNNSLDSGS I YAAMDDQP WQFAINQGKAYAI NMEGEAVGS 207 

Query: 226 KAVALPKNSG- -DLTKAVNKVIAKLDEQERYKSFI 258 

A A+ K SG +L K N A++ Y + 

Sbjct: 208 FAFAVKKGSGHDNLIKEFNTAFAQMKSDGTYNDIM 242 

40 

SEQ ID 4024 (GBS 154) was expressed in E.coli as a His-fusion product. The purified protein is shown in 
Figure 199, lane 6. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

45 Example 1316 

A DNA sequence (GBSxl396) was identified in S.agalactiae <SEQ ID 4025> which encodes the amino 
acid sequence <SEQ ID 4026>. This protein is predicted to be amino acid ABC transporter, ATP-binding 
protein (ghiQ). Analysis of this protein sequence reveals the following: 

Possible site: 60 
50 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4183 (Affirmative) < suco 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

55 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB90561 GB:AE001058 glutamine ABC transporter, ATP-binding 
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protein (glnQ) [Archaeoglobus fulgidus] 
Identities = 147/240 (61%) , Positives = 192/240 (79%) 

Query: 5 KIDVQDLHKSYGQNEVLKGIDAKFYEGDWCIIGPSGSGKSTFLRTLNLLESITSGKWV 64 
5 ++++ DLHK +G+ EVLKG+ K +G+W IIGPSGSGKST LR +N LE TSGK+++ 

Sbjct: 3 QLEIIDLHKRFGELEVLKGVTMKVEKGEvWIIGPSGSGKSTLLRCINRLEEPTSGKILL 62 

Query: 65 DGFELSNPKTDIDKARENIGMVFQHEmFPHMSVLENITFAPIELGKESKEAAEKHGMEL 124 
DG +++N K DI+K R+ IG+VFQ FNLFPH++ L+N+T API++ K SK AE+ GM L 
10 Sbjct: 63 DGVDITNSKIDINKVRQRIGIVFQQFNLFPHLTALQNTOLAPIKIKKMSKREAEELGMRL 122 

Query: 125 LEKVGLADKANAKPDSLSGGQKQRVAIARSLAMNPDILLFDEPTSALDPEMVGDVIJSrVMK 184 

LEKVGL DKA+ P LSGGQ+QRVAIAR+LAMNP+++LFDE TSALDPE+V +VL+VMK 
Sbjct: 123 LEKVGLEDKADYYPAQLSGGQQQRVAIARAIAMNPEVMLFDEVTSALDPELVKEVLDVMK 182 

15 

Query: 185 DIAEQGMTMLIVTHEMGFARQVANRVIFTDGGRFLEDGTPEQIFDTPQHPRLQDFLNKVL 244 

LA GMTM+ + VTHEMGFAR+V +RVIF DGG +E+G PEQIF P+H R + FL+ +L 
Sbjct: 183 QLARDGMTMVVVTHEMGFAREVGDRVIFMDGGVIvEEGKPEQIFSNPKHERTRKFLSMIL 242 

20 A related DNA sequence was identified in S.pyogenes <SEQ ID 4027> which encodes the amino acid 
sequence <SEQ ID 4028>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

»> Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0 .4149 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the databases: 

>GP:BAB05180 GB:AP001512 ABC transporter (substrate-binding protein) 
[Bacillus halodurans] 
Identities = 79/227 (34%) , Positives = 126/227 (54%) , Gaps = 10/227 (4%) 

35 Query: 35 KKTRKLWAVSPDYAPFEFKALVNGKDTIVGADVQLAQAIADELDVDLELSPMSFDNVLS 94 

+K LV+ S DY P+E + G+ IVG DV +A+ I EL +L++ M F+ ++ 
Sbjct: 48 EKKS VLVMGTSADYPPYES VDVTTGE- - IVGFDVDIAEYITSELGYELKIQDMDFNGIIP 105 

Query: 95 SLQTGKADLAISGISHTKERAKVYDFSIPYYQAFJ^AIVMRASDAKVTKNISDIjNGKKVAA 154 
40 +LQ G+ D A+SG++ T+ER K DFS YY A+N +V + D ++ DL GK V 

Sbjct: 106 ALQAGRVDFALSGMTPTEERKKSVDFSDVYYDAQNLWFKEEDG- -LSSVEDLAGKTVGV 163 

Query: 155 QKGS I - EEGLVKIQ - - LPKANL I SLTAMGEAINELKAGQVYAVTLEAPVAAGFLAQHKDL 211 
Q SI EE V++Q L + + + E + EL AG+V A+ +E VAAG L + 
45 Sbjct: 164 QLASIQEEAAVELQEELDGLTIETRNRVPELVQELLAGRVDALIIEDTVAAGHLEANP-- 221 

Query: 212 ALAPFSLKTSDGDAKAVALPKNSGDLTKAVNKVIAKLDEQERYKSFI 258 

L F++++ A+A PK+S +LT+ N+ + ++ E +1 

Sbjct: 222 GLVRFAIESEGETGSAIAFPKDS-ELTEPFNEKLQEMMEDGTMEELI 267 

50 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 223/246 (90%) , Positives = 238/246 (96%) 

Query: 1 MAELKIDVQDLHKSYGQNEVLKGIDAKFYEGDWCIIGPSGSGKSTFLRTLNLLESITSG 60 
55 M ELKIDVQDLHKSYGQNEVLKGIDAKFYEGDWCIIGPSGSGKSTFLRTLNLLE+ITSG 

Sbjct: 1 MTELKIDVQDLHKSYGQNEVLKGIDAKFYEGDWCIIGPSGSGKSTFLRTLNLLETITSG 60 

Query: 61 KVVvDGFELSNPKTDIDKARENIGMVFQHFNLFPHMSVLENITFAPIELGKESKEAAEKH 120 
KV+ VDGFELS +PKT+ IDKARENIGMVFQHFNLFPHM+ VLENI FAP+ELGKESKE A+KH 
60 Sbjct: 61 KVMVDGFELSDPKTNIDKARENIGlWFQHFNLFPHMTvLF^IIFAPvELGKESKEVAKKH 120 

Query: 121 GMELLEKVGIjADKANAKPDSLSGGQKQRVAIARSIAMNPDILLFDEPTSALDPEMVGDVL 180 

GM LLEKVGL+DKA+A P SLSGGQKQRVAIARSLftMNPDI+LFDEPTSALDPEMVGDVL 
Sbjct: 121 GMALLEKVGLSDKADAFPGSLSGGQKQRVAIARSLAMNPDIMLFDEPTSALDPEMVGDVL 180 



WO 02/34771 



PCT/GB01/04789 



-1450- 



Query: 181 NVMKDLAEQGMTMLIVTHEMGFARQVANRVIFTDGGRFLEDGTPEQIFDTPQHPRLQDFL 240 

NVMKDIoAEQGMTMLIVTHEMGFARQVANRVIFTDGG+FLEDGTPE+IFD P+HPRL +FL 
Sbjct: 181 NVMKDLAEQGMTMLIVTHEMGFARQVMJRVIFTDGGQFLEDGTPEEIFDHPKHPRLIEFL 240 

5 

Query: 241 NKVLNV 246 

+KVLNV 
Sbjct: 241 DKVLNV 246 

10 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1317 

A DNA sequence (GBSxl397) was identified in S.agalactiae <SEQ ID 4029> which encodes the amino 
acid sequence <SEQ ID 403 0>. Analysis of this protein sequence reveals the following: 

15 Possible site: 18 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2311 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 403 1> which encodes the amino acid 
25 sequence <SEQ ID 4032>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

>>> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0 . 2702 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

35 Identities = 45/51 (88%) , Positives = 49/51 (95%) 

Query: 1 MGDKPISFRDKDGNFVSAADvTOJAEKLEELFNTLNPNRKLRLEREKLAKEK 51 

MGDKPISF+DKDGNFVSAADVWNAEKLEELFN LNPNR+LRLEREKL K++ 
Sbjct: 11 MGDKPISFKDKDGNFVSARDvWNAEKLEEIiFNLLNPNRRLRLEREKLKKDE 61 

40 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1318 

A DNA sequence (GBSxl398) was identified in S.agalactiae <SEQ ID 4033> which encodes the amino 
45 acid sequence <SEQ ID 4034>. This protein is predicted to be spoOb-associated GTP-binding protein (obg). 
Analysis of this protein sequence reveals the following: 

Possible site: 14 

>» Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0 . 2967 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14752 GB:Z99118 GTPase activity [Bacillus subtilis] 
Identities = 297/435 (68%) , Positives = 345/435 (79%) , Gaps = 7/435 (1%) 

5 

Query: 3 MFLDTAKI SVKAGRGGDGMVAFRREKYVPNGGPWGGDGGKGGSVI FKVNEGLRTLMDFRY 62 

MF+D K+ VK G GG+GMVAFRREKYVP GGP GGDGGKGG V+F+V+EGLRTLMDFRY 
Sbjct: 1 MFVDQVKOTVKGGI)GGNGWAFRREKYVPKGGPAGGDGGKGGDWFEVDEGLRTLMDFRY 60 

10 Query: 63 NRNFKAKAGEKGMTKGMHGRGAEDLIVSLPPGTTVRDATTGKVITDLVEHDQEFWARGG 122 

++FKA GE GM+K HGR A+D+++ +PPGT V D T +VI DL EH Q V+ARGG 
Sbjct: 61 KKHFKAIRGEHGMSKNQHGRNADDMVIKVPPGTVVTDDDTKQVIADLTEHGQRA.VIARGG 120 

Query: 123 RGGRGNIRFATPRNPAPEIAENGEPGEERELQLELKILADVGLVGFPSVGKSTLLSWSA 182 
15 RGGRGN RFATP NPAP+++ENGEPG+ER + LELK+LADVGLVGFPSVGKSTLLSWS+ 

Sbjct: 121 RGGRGNSRFATPANPAPQLSENGEPGKERYIVLELKVLADVGLVGFPSVGKSTLLSWSS 180 

Query: 183 AKPKIGAYHFTTIVPNLGMVRTKSGDSFAMADLPGLIEGASQGVGLGTQFLRHIERTRVI 242 
AKPKI YHFTT+VPNLGMV T G SF MADLPGLIEGA QGVGLG QFLRHIERTRVI 
20 Sbjct: 181 AKPKIADYHFTTLVPNLGMVETDDGRSFVMADLPGLIEGAHQGVGLGHQFLRHIERTRVI 240 

Query: 243 LHVIDMSASEGRDPYDDWSIMffiLETYNLRLIffiRPQIIVANKMDMPDSEEHLAAFKEKL 302 

+HVIDMS EGRDPYDDY++IN EL YNLRL ERPQIIVANKMDMP++ ENL AFKEKL 
Sbjct: 241 VHVIDMSGLEGRDPYDDYLTINQELSEYNLRLTERPQII VANKMDMPEAAENLEAFKEKL 300 

25 

Query: 303 AANYDEFDDMPMIFPISSIAHQGLFJ^MDATAELLANTEEFLLYDETDMQEDEAYYGFNE 362 

DD P +FPIS++ +GL L+ A L NT EF LYDE ++ ++ Y 
Sbjct: 301 T DDYP-VFPISAVTREGLRELLFEVANQLENTPEFPLYDEEELTQNRVMYTMEN 353 

30 Query: 363 DERPFEITRDDDATWVLYGDKLEKLFVMTNMERDESIMKFARQLRGMGVDEALRERGAKD 422 

+E PF ITRD D +VL GD LE+LF MT+ RDES+ +FARQ+RGMGVDEALRERGAKD 
Sbjct: 354 EEVPFNITRDPDGVFVLSGDSJjERLFKMTDFSRDESVKRFARQMRGMGVDEALRERGAKD 413 

Query: 423 GDIVRIGNFEFEFVD 437 
35 GDI+R+ FEFEF+D 

Sbjct: 414 GDIIRLLEFEFEFID 428 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4035> which encodes the amino acid 

sequence <SEQ ID 4036>. Analysis of this protein sequence reveals the following: 

40 Possible site: 14 

>>> Seems to have no N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0 . 2588 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 394/437 (90%) , Positives = 421/437 (96%) 

50 

Query: 1 MSMFLDTAKI SVKAGRGGDGMVAFRREKYVPNGGPWGGDGGKGGS VI FKVNEGLRTLMDF 60 

MSMFLDTAKISV+AGRGGDGMVAFRREKYVPNGGPWGGDGGKGGSVIF+V+EGLRTLMDF 
Sbjct: 1 MSMFLDTAKISVQAGRGGDGMVAFRREKYVPNGGPWGGDGGKGGSVIFRVDEGLRTLMDF 60 

55 Query: 61 RYNRNFKAKAGEKGMTKGMHGRGAEDLIVSLPPGTTWDATTGKVITDLVEHDQEFWAR 120 

RYNR FKAK+GEKGMTKGMHGRGAEDIiIV +P GTTVRDA TGKVITDLVEH QE V+A+ 
Sbjct: 61 RYNRKFKAKSGEKGMTKGmGRGAEDLIVFVPQGTTVRDAETGKVITDLVEHGQEVVIAK 120 

Query: 121 GGRGGRGNIRFATPRNPAPEIAENGEPGEERELQLELKILADVGLVGFPSVGKSTLLSW 180 
60 GGRGGRGNIRFATPRNPAPEIAENGEPGEER+L+LELKILADVGLVGFPSVGKSTLLSW 

Sbjct: 121 GGRGGRGNIRFATPRNPAPEIAENGEPGEERQLELELKILADVGLVGFPSVGKSTLLSW 180 



Query: 181 SAAKPKIGAYHFTTIVPNLGMVRTKSGDSFAMADLPGLIEGASQGVGLGTQFLRHIERTR 240 
S+AKPKIGAYHFTTIVPNLGMVRTKSGDSFAMftDLPGLIEGASQGVGLGTQFLRHIERTR 
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Sbjct: 181 SSAKPKIGAYHFTTIVPNLGMVRTKSGDSFAMADLPGLIEGASQGVGLGTQFLRHIERTR 240 

Query: 241 VILHVIDMSASEGRDPYDDYVS INNELETYNLRLMERPQI I VANKMDMPDSEENLAAFKE 300 

VILHVIDMSASEGRDPY+DWSINlffiLETYNLRLMERPQIIVANKMD+P+++EML AFK+ 
Sbjct: 241 VILHVIDMSASEGRDPYEDYVSINNELETYNLRLMERPQI IVANKMDIPEAQENLKAFKK 300 

Query: 301 KLAANYDEFDDMPMIFPISSlMQGLFJMDATAELIiAI^EFLLyDETDMQEDEAYyGF 360 

KLAA YDEFDD+PMIFPISSLAHQGLENL++ATAELLA T+EFLLYDE+D+ ++EAYYGF 
Sbjct: 301 KIAAQYDEFDDLPMIFPISSLAHQGLENLLEATAELLAKTDEFLLYDESDLVDEEAYYGF 360 

Query: 361 I^DERPFEITRDDDATWVLYGDKLEKIjFVMTNMERDESIMKFARQLRGMGVDEALRERGA 420 

E E+ FEITRDDDATWVL g+kle+lfvmtnmerdesimkfarqlrgmgvdealrerga 

Sbjct: 361 AETEKDFEITRDDDATWVLSGEKLERLFVMTOMERDESIMKFARQLRGMGVDEALRERGA 420 

Query: 421 KDGDIVRIGNFEFEFVD 437 

KDGD VRIG FEFEFVD 
Sbjct: 421 KDGDPVRIGKFEFEFVD 437 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1319 

A DNA sequence (GBSxl399) was identified in S.agalactiae <SEQ ID 4037> which encodes the amino 
acid sequence <SEQ ID 403 8>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

>■>> Seems to have a cleavable N-term signal seq. 



Final Results 

bacterial outside — Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S. pyogenes <SEQ ID 403 9> which encodes the amino acid 
sequence <SEQ ID 4040>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) <; suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 30/42 (71%) , Positives = 37/42 (87%) 

Query: 1 MAFGDNGQRKKTGFEKLTLFWI LMVLVTVGGLVFGAI SAIM 42 

+AFG+NG RKKT FEK+T+FWILMVLVTVGGL+ A+S +M 
Sbjct: 1 VAFGENGPRKKTTFEKVTMFWILMVLVTVGGLIASALSVLM 42 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1320 

A DNA sequence (GBSxl401) was identified in S.agalactiae <SEQ ID 4041> which encodes the amino 
acid sequence <SEQ ID 4042>. Analysis of this protein sequence reveals the following: 
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Possible site: 48 

»> Seems to have no N- terminal signal sequence 



50 



55 



Final Results 

5 bacterial cytoplasm Certainty=0 . 2484 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

10 >GP:AAD28348 GB:AF102860 aminopeptidase PepS [Streptococcus hermophilus] 

Identities = 247/413 (59%) , Positives = 313/413 (74%) 

Query: 1 MVLQDFDNLLKOAQLIISKGLNVQKGHTIjALTIDvEQvHLARLLTEAAYEKGTASEVIVD 60 
MVL +F L+KYA+L+++ G+NVQ GHT+AL+ ID VEQ LA LL + AY GA+EVIV 
15 Sbjct: 1 MVLPNFKENLEKYAKLLVTNGIWQPGHTVALSIDVEQAELAHLLVKEAYALGAAEV1VQ 60 

Query: 61 YTDDFITRQRLLHASDEVLTNVPQYTVDKSrALUJKKASRLVVKSSNPNAFATVDPKRLS 120 

++DD I R+R LHA + VP Y + LL KKASRL V+SS+P+AF V P+RLS 
Sbjct: 61 WSDDTINRERFLHAEMNRIEEVPAYKKAEMEYLLEKKASRLGVRSSDPDAFNGVAPERLS 120 

20 

Query: 121 ETTRATAIALEEQSRAIQANKVSWNVAAAAGREWAALVFPELKTSDQQVDALWDTIFKLN 180 

+A A + A Q+NKVSW VAAAAG+EWA VFP + ++ VD LW+ IFK 
Sbjct: 121 AHAKAIGAAFKPMQVATQSNKVSWTVAAAAGKEWAKKVFPNASSDEEAVDLLWNQIFKTC 180 

25 Query: 181 RIYEDDPIAAWDAHEAKLLEKATRLNQEQFDALHYTAPGTDLTLGMPKNHIWEAAGSLNA 240 

R+YE DP+ AW H +L KA LN+ QF ALHYTAPGTDLTLG+PKNH+WE+AG++NA 
Sbjct: 181 RvYEKDPVRAWKEHADRLDAKARILNFAQFSALHYTAPGTDLTLGLPKNHVWESAGAINA 240 

Query: 241 QGETFIANMPTEEIFSAPDYRRADGYVTSTKPLSYAGVIIENMTFTFKDGKIINVTAEKG 300 
30 QGE+F+ NMPTEE+F+APD+RRA GYV+STKPLSY G I IE + TFKDG+I+++TA++G 

Sbjct: 241 QGESFLPNMPTEEVFTAPDFRRAYGYVSSTKPLSYNGNIIEGIKVTFKDGEIVDITADQG 300 

Query: 301 QETVQRLIEEmG^SLGEVALVPHKTPISLSGLIFFNTLFDENASNHIAIGTAYAFNVE 360 
++ ++ L+ N+GAR+LGE ALVP +PIS SG+ FFNTLFDENASNHLAIG AYA +VE 
35 Sbjct: 301 EKVMKNLVFNNNGARALGECALVPDSSPISQSGITFFNTLFDENASNHLAIGAAYATSVE 360 

Query: 361 GGTEMTSQELDEAGLNRSSTHVDFMIGSEQMDIDGIRADGTAVPIFRNGEWAI 413 

GG +MT +EL AGLNRS HVDF+IGS QM+IDGI DG+ VPIFRNG+W I 
Sbjct: 361 GGADMTEEELKAAGLNRSDvHVDFIIGSNQMNIDGIHHDGSRVPIFRNGDWVI 413 

40 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1321 

45 A DNA sequence (GBSxl403) was identified in S.agalactiae <SEQ ID 4045> which encodes the amino 
acid sequence <SEQ ID 4046>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -7.91 Transmembrane 661 - 677 ( 657 - 680) 



Final Results 

bacterial membrane Certainty=0. 4163 (Affirmative) < suco 

bacterial outside — - Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certaintyi=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8787> which encodes amino acid sequence <SEQ ID 8788> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
McG: Discrim Score: 6.47 
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GvH: Signal Score (-7.5): 1.01 

Possible site: 29 
»> Seems to have a cleavable N-term signal seq. 
ALOM program count: 1 value: -7.91 threshold: 0.0 

INTEGRAL Likelihood = -7.91 Transmembrane 658 - 673 ( 657 - 680) 
PERIPHERAL Likelihood = 4.35 555 
modified ALOM score: 2.08 

*** Reasoning Step: 3 

Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 

LPXTG motif: 647-651 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF09821 GB:AE001885 6-aminohexanoate-cyclic-dimer hydrolase 
[Deinococcus radiodurans] 
Identities = 150/497 (30%) , Positives = 233/497 (46%) , Gaps = 32/497 (6%) 

Query: 110 LTEETYKQKDGQDIAM^SGQVTSEELVMiiAYDIIAKENPSLNAVITTRRQEAIEEARK 169 

LT Y + D DLA + R G++++E++ A N +LNAV+ + + +AR 

Sbjct: 45 LTFAEYDRLDALDIAQLFRRGELSAEDMCTAAIHRAQVVWAI^AWYPLYDQGLAQARA 104 

Query: 170 L KDTNQPFLGVPLLVKGLGHSIKGGETNNGLIYADGKISTFDSSYVKKYKDLG 222 

+ PF GVP LVK G + G G +1 +D V++++ G 

Sbjct: 105 TDAARARGEQATGPFAGVPFLVKDFGSRLAGVPHTGGTRAYRDQIPEWDDELVRRWQAAG 164 

Query: 223 FIILGQTNFPEYGWRNITDSKLYGLTHNPWDLAHNAGGSSGGSAAAIASGMTPIASGSDA 282 

+ LG+TN PE+ +T+ +L+G T NPWDL GGSSGGSA+A+A+G+ P+A D 
Sbjct: 165 LLPLGKTNTPEFALMGVTEPELHGPTRNPWDLGRTPGGSSGGSASAVAAGIVPLAGAGDG 224 

Query: 283 GGS IRI PSSWTGLVGLKPTRGLV SNEKPDSYSTAVHFPLTKSSRDAETLLTYLKKSD 339 

GGSIRIP+S GL GLKP+RG V AV LT+S RD+ LL + D 

Sbjct: 225 GGSIRIPASCCGLFGLKPSRGRVPCGDGVGEPWQGAAVEHVLTRSVRDSAALLDLEQGPD 284 

Query: 340 QTLVSV NDLKSLPIAYTLKSPMGTEVSQDAKNAIMDNVTFLRKQGFK 386 

+ L I ++ P+G V + A+ L G + 

Sbjct: 285 AGAALFLPSPERPYSEEVGREPGRLRIGFSTAHPLGRSVHPECVAAVQGAARLLESLGHE 344 

Query: 387 VTEIDLPIDGRALMRDYSTLAIGMGGAFSTIEKDLKKHGFTKEDVDPITWAVHVIYQNSD 446 

V E+ LP DG AL + + L G GA +D DV+ +TW + + ++ 

Sbjct: 345 VEEVALPWDGPAIAQAFLMLYFGETGASLAALRDTLGRPARASDVEAVTWLLGQLGRSYS 404 

Query: 447 KAELKKSIMEAQKHMDDYRKAMEKLHKQFPIFLSPTTASLAPIjNTDPY VTEEDKRA 502 

A+ A+ + + +AM + H+ + + L+P A+ PL V RA 

Sbjct: 405 AAD FAAARASWNVHARAMGRFHQNYDLLLTPVLAT-PPLQIGELQPRGVQAALLRA 459 

Query: 503 IYNMENLSQEERIALFNRQWEPMLRRTPFTQIANMTGLPAISIPTYLSESGLPIGTMLMA 562 

M+ R + +L + P+TQ+AN+TG PA+S+P + + GLP+G +A 

Sbjct: 460 AQQMDVSGLLRRSGQVDAI^TDILEKMPYTQLANLTGQPAMSVPLHWTADGLPVGVQFVA 519 

Query: 563 GANYDMVLIKFATFFEK 579 

+ VL++ A E+ 
Sbjct: 520 PLAREDVLLRLAGQLEQ 536 

There is also homology to SEQ ID 4048. 

SEQ ID 8788 (GBS173) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 41 (lane 5; MW 96.8kDa). 

The GBS173-GST fusion product was purified (Figure 116A; see also Figure 201, lane 7) and used to 
immunise mice (lane 1+2 product; 15ug/mouse). The resulting antiserum was used for Western blot, FACS, 



Certainty=0. 4163 (Affirmative) < suco 

Certainty=0. 0000 (Not Clear) < suco 

Certainty=0. 0000 (Not Clear) < suco 
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and in the in vivo passive protection assay (Table III). These tests confirm that the protein is 
immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1322 

A DNA sequence (GBSxl404) was identified in S.agalactiae <SEQ ID 4049> which encodes the amino 
acid sequence <SEQ ID 4050>. This protein is predicted to be ribosomal large subunit pseudouridine 
synthase B (rsuA). Analysis of this protein sequence reveals the following: 

Possible site: 41 

>>> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 3674 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06992 GB:AP001518 16S pseudouridylate synthase [Bacillus halodurans] 
Identities = 110/236 (46%) , Positives = 149/236 (62%) , Gaps = 4/236 (1%) 

Query: 1 MRLDKFLVECGLGSRTQVKLILKKKQISVNGNSETSPKVQVDEYRDEIKYNGTLVSYEKF 60 

MR+DKFL G GSR VK +LK +VG P V+ +1 GVY+ + 

Sbjct: 1 MRIDKFIANMGFGSRKDVKKLLKTGAVRVQGQPIKDPSTHVEPESESITVYGEEVEYKPY 60 

Query: 61 VYYMLHKPKGVISATDDPSHKTVLDLLDKTlUlDKAVFPvGRLDIDTTGLLLLTNNGELAH 120 

VY M++KPKGVI AT+D H+TV+DLL + R PVGRLD DT GLLL+TN+G+ H 

Sbjct: 61 WLMMNKPKGVICATEDLEHEWIDLLGEEERHYEPSPVGRLDKDTVGLLLITNDGKFNH 120 

Query: 121 KMLSPKIOIVDKCYEVKISGIMTEDDIIjAFDKGIILKD-FTCLPALLEIvEVNQVKKQSLV 179 

++SPK HV K Y + G +TE+D+ AF G++L D + PA L I+E +S + 

Sbjct: 121 WLMSPKHHVPKTYRALVEGHVTEEDVGAFSHGVVLDDGYVTKPATLHILEAG ARSHI 177 

Query: 180 KITIKEGKFHQVKRMVAACGKEVLELKRLRMGNLQLDKQLESGQWRRLTIKEIEKL 235 

++ + EGKFHQVKRM A GK VLEL+R+++GNL LD +L G++R LT +EI L 
Sbjct: 178 ELILTEGKFHQVKRMFQAVGKRVLELERIKIGNLLLDPELARGEYRELTKEEIALL 233 

A related DNA sequence was identified in S. pyogenes <SEQ ID 405 1> which encodes the amino acid 
sequence <SEQ ID 4052>. Analysis of this protein sequence reveals the following: 

Possible site: 40 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 0152 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF09821 GB.-AE001885 6-aminohexanoate-cyclic-dimer hydrolase 
[Deinococcus radiodurans] 
Identities = 177/485 (36%) , Positives = 259/485 (52%) , Gaps = 13/485 (2%) 

Query: 5 DATAMAIAVQTGQTTPLELOTQAIYKAKKLNPTIjNAITSERFEAALEEAKQRDFSGL 61 

DA +A + G+ + ++ T AI++A+ +N UJA+ ++ L +A+ D + 
Sbjct: 54 DALDIAQLFRRGELSAEDMCTARIHRAQVVNVAIjNKVvYPLYDQ^IAQA 113 

Query: 62 PFAGVPLFLKDLGQELKGHSSTSGSRLFKEYQATKTDLFVKRLEALGFIILGRSNT 117 

PFAGVP +KD G L G T G+R +++ D V+R +A G + LG++NT 
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Sb j ct : 


114 


QATGPFAGVPFLVKDFGSRLAGVPHTGGTRAYRDQIPEWDDELVRRWQAAGLLPLGKTOT 


173 


Query: 


118 


PEFGFKNISDSSLHGPVNLPRDOTRNAGGSSGGAAALVSSGISAIATASDGGGSIRIPAS 


177 






PEF +++ LHGP P D R GGSSGG+A+ V++GI LA A DGGGSIRIPAS 




Sb j ct : 


174 


PEFALMGVTEPELHGPTRNPVTOLGRTPGGSSGGSASAVAAGIVPLAGAGDGGGSIRIPAS 


233 


Query: 


178 


FNGLIGLKPSRGRMPVGPGSYRSWQGASVHFALTKSTODTRNLLYYLQMEQMESPFPLAT 


237 






GL GLKPSRGR+P G G WQGA+V LT+SVRD+ LL Q + L + 




Sb j ct : 


234 


CCGLFGLKPSRGRVPCGDGVGEPWQGAAVEHvLTRSVRDSAALLDLEQGPDAGAALFLPS 


293 


Query: 


238 


LTKDSIYQSLQRP- -LTIAFYQRLSDGSPVSLDTAKALRQAVTWLREQGHQLVELEEFPV 


295 






+ + +PLIF GV + A++ A L GH++ E+ P 




Sb j ct : 


294 


PERPYSEEVGREPGRLRIGFSTAHPLGRSVHPECVAAVQGAARLLESLGHEVEEV-ALPW 


352 


Query: 


296 


NMTEVIRHYYIMNSVETAAMFADIEDTFGRPMTKDDMETMTWAIYQSGKDIPAWRYSQVL 


355 






+ + + + ++ ET A A + DT GRP D+E +TW + Q G+ A ++ 




Sb j ct : 


353 


DGPALAQAFLMLYFGETGASLAACRDTLGRPARASDVEAVTWLLGQLGRSYSAADFAAAR 


412 


Query: 


356 


QKWDTYSATMASFHETYDLLLTFTTNTPAPKHGELVP DSKLMANLAQAEI FSSEEQF 


412 






W+ ++ M FH+ YDLLLT TP + GEL P + L+ Q ++ + 




Sb j ct : 


413 


ASWNVHARAMGRFHQNYDLLLTPVLATPPLQIGELQPRGVQAALLRAAQQMDVSGLLRRS 


472 


Query: 


413 


NLVETMFGKSLAINPYTALPNLTGQPAISLPTYETKEGLSMGIQLIAAKGREDLLLGIAE 


472 






V+ + L PYT L NLTGQPA+S+P + T +GL +G+Q +A RED+LL +A 




Sb j ct : 


473 


GQVDALATDILEKMPYTQLANLTGQPAMSVPLHWTADGLPVGVQFVAPLAREDVLLRLAG 


532 


Query: 


473 


QFEAA 477 








Q E A 




Sb j ct : 


533 


QLEQA 537 





An alignment of the GAS and GBS proteins is shown below. 

Identities = 151/240 (62%) , Positives = 183/240 (75%) 



Query: 


1 


MRLDKFLVECGLGSRTQVKLILKKKQISVNGNSETSPKVQVDEYRDEIKYNGTLVSYEKF 


60 






MRLDKFLV G+G+R+QVKL+LKKK I VN ETS K +DEY+D + Y GT + YE F 




Sb j ct : 


2 


MRLDKFLVATGVGTRSQVKLLLKKKAIFVNQKVETSAKAHIDEYKDLVTYQGTPLVYESF 


61 


Query: 


61 


VYYMLHKPKGVISATDDPSHKTVLDLLDKTARDKAVFPVGRLDIDTTGLLLLTNNGELAH 


120 






VYY+L+KP G +SAT D TV++LLD TAR KAVFPVGRLD DT GLLLLTNNG+LAH 




Sb j ct : 


62 


.VYYLIJStKPSGYVSATQDRQQATVMELLDDTARQKAVFPVGRLDKDTRGLLLLTNNGQLAH 


121 


Query: 


121 


KMLSPKKHVDKCYEVKISGIMTEDDILAFDKGIILKDFTCLPALLEIVEVNQVKKQSLVK 


180 






+LSPKKHV K Y K++GIMTE D F +GI LKD CLPA LE++ + ++ SLVK 




Sb j ct : 


122 


DLLSPKKHVTKEYLAKVAGIMTEADKDYFARGISLKDHQCLPAHLEVLftSDLQQQTSLVK 


181 


Query: 


181 


ITIKEGKFHQVKRMVAACGKEVLELKRLRMGNLQLDKQLESGQWRRLTIKEIEKLEKYMQ 


240 






ITI+EGKFHQVKRMVAACGKEVL+L+RL MG L+LD L G++RRLT +E++ L Y Q 




Sb j ct : 


182 


ITIQEGKFHQVKRMVAACGKEVLDLQRLSMGPLKLDPSLAEGEFRRLTPEELQSLAPYCQ 


241 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1323 

A DNA sequence (GBSxl405) was identified in S.agalactiae <SEQ ID 4053> which encodes the amino 
acid sequence <SEQ ID 4054>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2811 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 10007> which encodes amino acid sequence <SEQ ID 
10008> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA57350 GB:J04483 reductase [Leishmania major] 
Identities = 129/277 (46%) , Positives = 167/277 (59%) , Gaps = 3/277 (1%) 

TLSNTLNI PKIGFGTWQLTEGEEAYKAVTHALKVGYTHIDTAQI YGNEHSVGRAIRDSGL 8 5 
TLSN + +P+ G G WQ GE AV AL GY HIDTA IY NE SVG +R SG+ 
TLSNGVKMPQFGLGWQSPAGEVTENAVNWALCAGYRHIDTAAIYKNEESVGAGLRASGV 69 

ARESIFLTTKIWNDKHDYHLAKASIDESLQKLGVDYIDLLLIHWPNPKALRENDAWKAGN 145 

RE +F+TTK+WN + Y A+ +ES QKLGVDYIDL LIHWP K + + K 
PREDVFITTKLWm'EQGYESTIAAFEESRQKLGVDYIDLYLIHWPRGKDILSKEGKKY- - 127 

AGTWKAMEEAYKEGKVKAIGVSNFMKHHLEALFETAE I KPMVNQI IIAPGCAQEDLVRFC 205 
+W+A E+ YKE KV+AIGVSNF HHLE + + PMVKQ+ LP Q DL FC 



Query: 


26 


Sb j ct : 


10 


Query: 


86 


Sbj ct : 


70 


Query: 


146 


Sbj ct : 


128 


Query: 


206 


Sbj ct : 


188 


Query: 


266 


Sbj ct : 


248 



20 I +EA+SP G G + N + AI KY K+ AQV LRW++ + +PKS + I 



E N DIFDF+L +D+ ++ L++ + DPD 



A related DNA sequence was identified in S.pyogenes <SEQ ID 779> which encodes the amino acid 
sequence <SEQ ID 780>. Analysis of this protein sequence reveals the following: 

Possible site: 27 
30 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 0980 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 155/282 (54%) , Positives = 204/282 (71%) , Gaps = 2/282 (0%) 



IRDSGIARESIFLTTKIWNDKHDYH1AKASIDESLQKLGVDYIDLLLIHWPNPKALREND 13 9 
45 I+DSG+ RE +F+TTK+WND H Y AK ++ SL +LG+DY+DL LIHWPNPKALR + 

IKDSGVLREDLFITTKLWNDAHSYEGAKDALAASLDRLGLDYVDLYLIHWPNPKALR- -N 118 



Query: 


20 


Sbj ct : 


1 


Query: 


80 


Sbj ct : 


61 


Query: 


140 


Sbj ct : 


119 


Query: 


200 


Sbj ct : 


179 


Query: 


260 


Sbj ct : 


239 



+++ T +++ IP +GFGT+Q +GEEAY++ A+K GY HIDTA IY NE SVGRA 
VMVTTVKMTSGYE I PVLGFGTYGAADGEEAYQSTLAAI KAGYRH IDTAAI YKNEESVGRA 60 



WK NA W+ MEEA + G +K+IGVSNFM HHLEAL ETA+I P +NQI IAPGC Q+ 



++V +CK N+ILLEA+SP G G IF+NE+++ +A KY K+VAQVAL WSL GF+PLPKS 



+ 1+ N+ IFD L ++D T+ L +PD SF 



60 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1324 

A DNA sequence (GBSxl406) was identified in S.agalactiae <SEQ ID 4055> which encodes the amino 
acid sequence <SEQ ID 4056>. Analysis of this protein sequence reveals the following: 

Possible site: 26 
5 »> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0633 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10009> which encodes amino acid sequence <SEQ ID 
10010> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:CAB12612 GB:Z99108 similar to NAD (P) H-f lavin oxidoreductase 

[Bacillus subtilis] 
Identities = 106/223 (47%) , Positives = 150/223 (66%) , Gaps = 8/223 (3%) 

Query: 29 DIKKQTORAFDFRMAIRVYN-NNDIPKEDMEYILDTAWLSPSSVGLEGWRFLVLDRQTIA 87 
20 D+K Q+ A++FR A + ++ N + D E+IL+T LSPSS+GLE W+F+V+ 

Sbjct: 3 DLKTQILDAYNFRHATKEFDPNKKVSDSDFEFILETGRLSPSSLGLEPWKFVWQNP 59 

Query: 88 KFRDKLKEVAWGAQYQLDTASHFVLLLAE--KGAYYNADSMINSLIRRGLGDPAALESRI 145 
+FR+KL+E WGAQ QL TASHFVL+LA K YNAD + L E + 

25 Sbjct: 60 EFREKLREYTWGAQKQLPTASHFVLIIARTAKDIKYNADYIKRHLKEVKQMPQDVYEGYL 119 

Query: 146 PLYKSFQENDMKI-DSERSLVTOWTAKQTYIALGNMMTAAAMIGVDSCPIEGFDYEKVNNI 204 

+ FQ+ND+ + +S+R+L+DW +KQTYIALGNMMTAAA IGVDSCPIEGF Y+ ++ I 
Sbjct: 120 SKTEEFQKTOIJILLESDRTLFDWASKQTYIALGNMMTAAAQIGVDSCPIEGFQYDHIHRI 179 

30 

Query: 205 LSKEGLIDDKKEAISCMVSFGYRLREPKHSRARKERQEVITWV 247 

L +EGL+++ IS MV+FGYR+R+P+ + R ++V+ WV 
Sbjct: 180 LEEEGLLENGSFDI S VMVAFGYRVRDPR- PKTRSAVEDWKWV 221 

35 A related DNA sequence was identified in S. pyogenes <SEQ ID 405 7> which encodes the amino acid 
sequence <SEQ ID 4058>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

»> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0. 1705 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

45 An alignment of the GAS and GBS proteins is shown below. 

Identities = 126/222 (56%) , Positives = 174/222 (77%) , Gaps = 4/222 (1%) 

Query: 28 EDIKKQVRRAFDFRMAIRVYNNNDIPKEDiVIEYILDTAWLSPSSVGLEGWRFLVLDRQTIA 87 
+ I Q+++A FR A+RVY I ED+ ILD AWLS PSS +GLEGWRF+ VLD + I 
50 Sbjct: 3 QTIHHQIQQALHFRTAWVYKEEKISDFilLALILDAAWLSPSSIGLEGWRFVVLDNKPI- 61 

Query: 88 KFRDKLKEVAWGAQYQLDTASHFVLLLAEKGAYYNADSMINSLIRRGLGDPAALESRIPL 147 

++++K AWGAQYQL+TASHF+LL+AEK A Y++ ++ NSL+RRG+ + L SR+ L 
Sbjct: 62 --KEEIKPFAWGAQYQLETASHFILLIAEKHARYDSPAIKNSLLRRGIKEGDGLNSRLKL 119 



55 



Query: 148 YKSFQENDMKI-DSERSLWDWTAKQTYIALGNIvlMTAAAMIGVDSCPIEGFDYEKVNNILS 206 

Y+SFQ+ DM + D+ R+L+DWTAKQTYIALGNMM AA++G+D+CPIEGF Y+KVN+IL+ 
Sbjct: 120 YESFQKED^MADNPRALFDWTAKQTYIALGI»MTAALLGIDTCPIEGFHYDKVNHILA 179 
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Query: 207 KEGLIDDKKEAISCMVSFGYRLREPKHSRARKERQEVITWVE 248 

K +ID +KE 1+ M+S GYRLR+PKH++ RK +4EVI+ V+ 
Sbjct: 180 KHNVIDLEKEGIASMLSLGYRLRDPKHAQVRKPKEEVISWK 221 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1325 

A DNA sequence (GBSxl407) was identified in S.agalactiae <SEQ ID 4059> which encodes the amino 
acid sequence <SEQ ID 406O. This protein is predicted to be lactoylglutathione lyase (gloA). Analysis of 
10 this protein sequence reveals the following: 

Possible site: 25 

>>> Seems to have no N-terminal signal sequence 

: Final Results 

15 bacterial cytoplasm Certainty=0 . 1656 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

20 >GP:AAC21986 GB:U32717 lactoylglutathione lyase (gloA) [Haemophilus influenzae Rd] 

Identities = 59/131 (45%) , Positives = 86/131 (65%) , Gaps = 2/131 (1%) 

MPFLHTC IRVKDLDAS IAFYQEALGFKEVRRNDFPENQFTL VYMALEDDPSY - ELELTYN 5 9 
M LHT +RV DLD SI FYQ+ LG + +R ++ PE ++TL ++ ED S E+ELTYN 
25 Sbjct: 1 MQILHTMLRVGDLDRSIKFYQDVIjGMRLIjRTSENPEYKYTLAFLGYEDGESAAEIELTYN 60 



Query: 


1 


Sb j ct : 


1 


Query: 


60 


Sb j ct : 


61 


Query: 


119 


Sb j ct : 


121 



+ + Y+ G YGHIA+GVDD+ T +A + +G +VT+ +G + G + F++DPDGYK 



IE I K+ 



35 A related DNA sequence was identified in S.pyogenes <SEQ ID 406 1> which encodes the amino acid 
sequence <SEQ ID 4062>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

»> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0 . 1382 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 An alignment of the GAS and GBS proteins is shown below. 

Identities = 80/125 (64%) , Positives = 93/125 (74%) , Gaps = 1/125 (0%) 

Query: 1 MPFLHTCIRVKDLDASIAFYQFJULGFKEWRNDFPENQFTLVYMALEDDPSYELELTYNY 60 
M LHTCIRVKDLD S+AFY A FKE R DFP++QFTLVY+ALE + SYELELTYNY 
50 Sbjct: 1 MKALHTCIRVKDLDQS VAFYTSAFPFKENYRKDFPDSQFTLVYLALEGE - SYELELTYNY 59 

Query: 61 DHEAYDLGNGYGHIAVGVDDLETTYDAHQKAGYSVTKISGLPGKPNMFYFIQDPDGYKIE 120 

H YDLGNGYGHIA+G + E + H++AG+ VT ILK +YFIQDPDGYKIE 
Sbjct: 60 GHGDYDLGNGYGHIALGSEHFFJfflHKKHRQAGFPVTDIKELADKSARYYFIQDPDGYKIE 119 



Query: 121 VIRLS 125 

VI L+ 
Sbjct: 120 VIDLN 124 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1326 

5 A DNA sequence (GBSxl408) was identified in S.agalactiae <SEQ ID 4063> which encodes the amino 
acid sequence <SEQ ID 4064>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -9.02 Transmembrane 241 - 257 ( 229 - 262) 
10 INTEGRAL Likelihood = -4.94 Transmembrane 270 - 286 ( 264 - 287) 

Final Results 

bacterial membrane Certainty=0 .4609 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



20 



>GP:CAB12688 GB:Z99108 stress response protein [Bacillus subtilis] 
Identities = 139/304 (45%) , Positives = 200/304 (65%) , Gaps = 3/304 (0%) 

Query: 3 LLSVIVPCYNEQETVSTFLTEIKKVESEMARYTHFEYIFVNDGSTDRTLELLKKAAKQFD 62 

L+S+I+P YNE V +KK E + Y +E F+NDGS D TL+ +K A 

Sbjct: 5 LISIIIPSYNEGYNVKLIHESLKK-EFKNIHYD-YEIFFINDGSVDDTLQQIKDLAATCS 62 

25 Query: 63 NVHYLSFSRHFGKDAALIiAGLEHTTGDFITWTOVDLQDPPTLLPEMYLKLQEGYDIVATR 122 

V Y+SFSR+FGK+AA+LAG EH G+ + VMD DLQ P LL E +EGYD V + 

Sbjct: 63 RVKYISFSRNFGKI^IIjAGFEHVO^EAVIvM3ADLQHPTYLLKEFIKGYEEGYDQVIAQ 122 

Query: 123 RKDRKGEPLIRSLFAKLFYKLINQVSDTKMVDGARDFRLMTKQVVDSILELNEVNRFSKG 182 
30 R +RKG+ +RSL + ++YK IN+. + + DG DFRL+++Q V+++L+L+E NRFSKG 

Sbjct: 123 R-NRKGDSFVRSLLSSMYYKFINKAVEVDLRDGVGDFRLLSRQAVNALLKLSEGNRFSKG 181 

Query: 183 IFSWIGYDVAYISYENRERIAGKTSWSFFNLLKYSLDGFINFSEIPLAIATWIGTLSSVL 242 
+F WIG+D + YEN ER G + WSF +L Y +DG ++F+ PL + + G +L 
35 Sbjct: 182 LFCWIGFDQKIVFYENVERKNGTSKWSFSSLFNYGMDGWSFNHKPLRLCFYTGIFILLL 241 

Query: 243 SLIAIIFIIIRKLLFGDPVSGWASTVTIVLFMGGIQLLSLGIIGKYISKIFLETKKRPVY 302 

S++ II ++ L G V G+ + ++ VLF+GG+QLLSLGIIG+YI +1+ ETKKRP Y 
Sbjct: 242 S I IYI IATFVKILTNGI SVPGYFTI ISAVLFLGG VQLLSLGI IGEYIGRI YYETKKRPHY 301 



40 



Query: 303 IVKE 306 
++KE 

Sbjct: 302 LIKE 305 



45 A related DNA sequence was identified in S.pyogenes <SEQ ID 4065> which encodes the amino acid 
sequence <SEQ ID 4066>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -9.55 Transmembrane 256 - 272 ( 251 - 282) 
50 INTEGRAL Likelihood = -5.31 Transmembrane 290 - 306 ( 284 - 307) 

Final Results 

bacterial membrane Certainty=0. 4821 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

55 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9113> which encodes the amino acid sequence 
<SEQ ID 9114>. Analysis of this protein sequence reveals the following: 
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Possible cleavage site: 36 
>>> Seems to have an uncleavable N-term signal seq 



Final Results 

5 bacterial membrane Certainty= 0.482 (Affirmative) < suco 

bacterial outside Certainty= 0.000 (Not Clear) < suco 

bacterial cytoplasm Certainty= 0.000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

10 Identities = 207/307 (67%), Positives = 258/307 (83%) 

Query: 1 MALLSVIVPCYNEQEWSTFLTEIKKVESEMARYTHFEYIFVNDGSTDRTLELLKKAAKQ 60 

M LLS+IVPC+NE+ + + E+ ++E+ M FEYIF++DGS D TL +L++ A + 

Sbjct: 21 MTLLSIIVPCFNEEANILPYFEEMHQLETSMTNQLAFEYIFIDDGSKDNTLGILRELAAR 80 

15 

Query: 61 FDNVHYLSFSRHFGKDAALLAGLEHTTGDFITVMDVDLQDPPTLLPEMYLKLQEGYDIVA 120 

F NVHYLSFSRHFGK+A LLAGL+ G++ITVMDVDLQDPP LLP MY KL+EGYDIV 
Sbjct: 81 FPNVHYLSFSRHFGKEAGLIAGLKEMGNYITVMDVDLQDPPELLPIMYAKLKEGYDIVG 140 

20 Query: 121 TRRKDRKGEPLIRSLFAKLFYKLINQVSDTKMVDGARDFRLMTKQVVDSILELNEVNRFS 180 

TRR++R+GEPLIRS+ + LFY LI +SDT+MV+G RD+RLMT+QWDSILEL EVNRFS 
Sbjct: 141 TRRQNRQGEPLIRSMCSNLFYGLIKHLSDTEMVNGVRDYRLMTRQWDSILELGEvNRFS 200 

Query: 181 KGIFSWIGYDVAYISYENRERIAGKTSWSFFNLLKYSLDGFINFSEIPLAIATWIGTLSS 240 
25 KGIFSW+GY + Y+S+EN++R GK+ W F+ LL+YSLDGFINFSE+PL IATW GT S 

Sbjct: 201 KGIFSWVGYRITYLSFENQKRKYGKSRWHFWELLRYSLDGFINFSEMPLTIATWTGTFSF 260 

Query: 241 VLSLLAIIFIIIRKLLFGDPVSGWASTVTIVLFMGGIQLLSLGIIGKYISKIFLETKKRP 300 
++S+ AI+FI I IRK+LFGDPVSGWASTV+I+LFMGGIQL +GI IGKYI SKI FLETKKRP 
30 Sbjct: 261 LISIFAILFIIIRKILFGDPVSGWASTVSIILFMGGIQLFCMGIIGKYISKI FLETKKRP 320 

Query: 301 VYIVKEE 307 

+YI+KE+ 
Sbjct: 321 LYIIKEK 327 

35 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1327 

A DNA sequence (GBSxl409) was identified in S.agalactiae <SEQ ID 4067> which encodes the amino 
40 acid sequence <SEQ ID 4068>. This protein is predicted to be d-serine/d-alanine/glycine transporter (cycA). 
Analysis of this protein sequence reveals the following: 

Possible site: 49 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.44 Transmembrane 50 - 66 ( 50 - 66) 
45 INTEGRAL Likelihood = -1.49 Transmembrane 27 - 43 ( 27 - 43) 

Final Results 

bacterial membrane Certainty=0 . 1977 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA83253 GB:Z31377 potential amino acid permease 
[Lactobacillus delbrueckii] 
55 Identities = 34/55 (61%) , Positives = 44/55 (79%) 

Query: 7 DHTQKSENGMVRGLENRHVQLIAIAGTIGTGLFLGAGRSISLTGPSIVLVYAITG 61 

D + ++ +G +R L NRHVQ+IAI GTIGTGLFLGAG +IS TGPS++ +YAI G 
Sbjct: 5 DRSIENTDGTIRSLSNRHVQMIAIGGTIGTGLFLGAGTTISATGPSVIFIYAIMG 59 

60 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 4069> which encodes the amino acid 
sequence <SEQ ID 407O. Analysis of this protein sequence reveals the following: 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



Possible site: 53 
>» Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood 




•11 


,15 


Transmembrane 


170 


- 186 


( 


161 - 


190) 


INTEGRAL 


Likelihood 




-8. 


.44 


Transmembrane 


256 


- 272 


( 


252 - 


274) 


INTEGRAL 


Likelihood 




-8. 


.33 


Transmembrane 


352 


- 368 


( 


347 - 


375) 


INTEGRAL 


Likelihood 




-7 


.54 


Transmembrane 


139 


- 155 


( 


133 - 


160) 


INTEGRAL 


Likelihood 




-5 


.73 


Transmembrane 


420 


- 436 


( 


417 - 


440) 


INTEGRAL 


Likelihood 




-3 


.88 


Transmembrane 


56 


- 72 


( 


54 - 


75) 


INTEGRAL 


Likelihood 




-3. 


.40 


Transmembrane 


283 


- 299 


( 


282 - 


300) 


INTEGRAL 


Likelihood 




-3.29 


Transmembrane 


440 


- 456 


( 


439 - 


458) 


INTEGRAL 


Likelihood 




-1 


.49 


Transmembrane 


31 


- 47 


( 


31 - 


47) 


INTEGRAL 


Likelihood 




-1 


.33 


Transmembrane 


109 


- 125 


( 


109 - 


127) 



Final Results 

bacterial membrane -- 
bacterial outside -- 
bacterial cytoplasm -■ 



- Certainty=0. 5458 (Affirmative) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:CAB14651 GB:Z99117 amino acid permease [Bacillus subtilis] 
Identities = 210/454 (46%) , Positives = 296/454 (64%) , Gaps = 11/454 (2%) 

Query: 12 DNNELENGMTOGLENRHVQLIAIAGTIGTGLFLGAGRSIALTGPSIIFVYMITGAFMFMM 71 

DN + + RGL+NRH+QL+AI G IGTGLFLG+G+SI GPSI+F Y+ITG F F + 
Sbjct: 8 DNFGQQQKLSRGLKNRHIQLMAIGGAIGTGLFLGSGKSIHFAGPSILFAYLITGVFCFFI 67 

Query: 72 MRAIGEMLYYDPDQHTFINFISIWIGPGWGYFSGLSYWISLIFIGMAEITAVGAYVQFWF 131 

+R++GE+L + H+F++F+ Y+G + +G +YW I + MA++TAVG Y Q+W 
Sbjct: 68 IRSLGELLLSNAGYHSFVDFVRDYIiGNMAAFITGWTYWFCWISLAMADLTAVGIYTQYWL 127 

Query: 132 PSWPAWLIQLVFLVLLSSINLIAVRVFGETEFWFAMIKILAILALIATAIFMVLTGFETH 191 

P P WL L+ L++L +NL V++FGE EFWFA+IK++AILALI T I ++ GF 
Sbjct: 128 PDVPQWLPGLI^IILLIMNLATVKLFGELEFWFALIKVIAILALIVTGILLIAKGFSAA 187 

Query: 192 TGHASLSNIFDHFSMFPNGKLKFFMAFQMVFFAYQAIEFVGITTSETANPRKVLPKAIQE 251 

+G ASL+N++ H MFPNG F ++FQMV FA+ IE VG+T ET NP+KV+PKAI + 
Sbjct: 188 SGPASLNNLWSHGGMFPNGWHGFILSFQMWFAFVGIELVGLTAGETENPQKVIPKAINQ 247 

Query: 252 IPTRIVIFYVGALVSIMAIVPraQLPVDESPFVMVFKLIGIKWAAALINFVVLTSAASAL 311 

IP RI++FYVGAL IM I PW+ L +ESPFV VF +GI AA+LINFWLTSAASA ' 
Sbjct: 248 IPVRILLFYVGALFVIMCIYPWNVLNPNESPFVQVFSAVGIWAASLINFWLTSAASAA 307 

Query: 312 NSTLYSTGRHLYQIANE- -TPNALTNRLKINTLSRQGVPSRAIIASAWVGISALINILP 369 

NS L+ST R +Y +A + PL L+ VPS A+ S++ + 1 +N L 

Sbjct: 308 NSALFSTSRMVYSLAKDHHAPGLL KKLTSSNVPSNALFFSSIAILIGVSLNYLM 361 

Query: 370 GVADAFSLITASSSGVYIAIYALTMIAHWKYRQSK--DFMADGYLMPKYKVTTPLTLAFF 427 

F+LIT+ S+ +11+ +T+I H KYR+++ + A+ + MP Y ++ LTLAF 
Sbjct: 362 -PEQVFTLITSVSTICFIFIWGITVICHLKYRKTRQHEAKANKFKMPFYPLSNYLTLAFL 420 

Query: 428 AFVFISLFLQESTYIGAIGATIWIIIFGIYSNVK 461 

AF+ + L L T I +W ++ I V+ 

Sbjct: 421 AFILVILALANDTRIALFVTPWFVLLIILYKVQ 454 



60 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 48/62 (77%) , Positives = 51/62 (81%) 

Query: 1 MSKNNNDHTQKSENG^TVRGLENRHVQLIAIAGTIGTGLFLGAGRSISLTGPSIVLVYAITGA 62 

MS + ENGMVRGLENRHVQLIAIAGTIGTGLFLGAGRSI+LTGPSI+ VY ITGA 

Sbjct: 5 MSIKEQTDNNELENGMVRGLENRHVQLIAIAGTIGTGLFLGAGRSIALTGPSIIFVYMITGA 66 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1328 

A DNA sequence (GBSxl411) was identified in S.agalactiae <SEQ ID 4071> which encodes the amino 
5 acid sequence <SEQ ID 4072>. This protein is predicted to be alkylphosphonate uptake protein (phnA). 
Analysis of this protein sequence reveals the following: 

Possible site: 29 

>» Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 . 0965 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC77069 GB.-AE000483 orf , hypothetical protein [Escherichia coli K12] 
Identities = 79/110 (71%), Positives = 91/110 (81%), Gaps = 1/110 (0%) 

Query: 1 MSLPNCPKCNSEYVYEDGILLVCPECAYEWNPEE-IEEEVGLIVLDSNGTRLSDGDTVTV 59 
20 MSLP+CPKCNSEY YED + +CPECAYEWN E +E LIV D+NG L+DGD+VT+ 

Sbjct: 1 MSLPHCPKCNSEYTYEDNGMYICPEC^YEMIDAEPAQESDELIVKDANGNLLaDGDSVTI 60 

Query: 60 IKDLKVKGAPKDIKQGTRVKNIRLVDGDHNIDCKIDGFGAMKLKSEFVKK 109 
IKDLKVKG+ +K GT+VKNIRLV+GDHNIDCKIDGFG MKLKSEFVKK 
25 Sbjct: 61 IKDLKVKGSSSMLKIGTKVKNIRLVEGDHNIDCKIDGFGPMKLKSEFVKK 110 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4073> which encodes the amino acid 

sequence <SEQ ID 4074>. Analysis of this protein sequence reveals the following: 

Possible site: 14 
30 »> Seems to have no N-terminal signal sequence 

Final Results 

, bacterial cytoplasm Certainty=0 . 3428 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 73/85 (85%) , Positives = 79/85 (92%) , Gaps = 1/85 (1%) 

40 Query: 26 (^YEraP-EEIEEEVGLIVI,DSNGTRLSDGDTVTVIKDLKVKGAPKDIKQGTRVKNIRLV 84 

CA+EW P EE EE GL+VLDSNG RLSDGDT+TV+KDLKVKGAPKD+KQGTRVKNIRLV 
Sbjct: 2 CAFEVfTPGEEATEEEGLvVLDSNGVRLSTODTITVVKDLKVKGAPKDLKQGTRVKNIRIjV 61 

, Query: 85 DGDHNIDCKIDGFGAMKLKSEFVKK 109 
45 +GDHNIDCKIDGFGAMKLKSEFVKK 

Sbjct: 62 EGDHNIDCKIDGFGAMKLKSEFVKK 86 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

50 Example 1329 

A DNA sequence (GBSxl412) was identified in S.agalactiae <SEQ ID 4075> which encodes the amino 
acid sequence <SEQ ID 407 6>. Analysis of this protein sequence reveals the following: 

Possible site: 22 
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>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3665 (Affirmative) < suco 

5 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database, but there is 
homology to SEQ ID 500. 

10 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1330 

A DNA sequence (GBSxl414) was identified in S.agalactiae <SEQ ID 4077> which encodes the amino 
acid sequence <SEQ ID 4078>. Analysis of this protein sequence reveals the following: 

15 Possible site: 13 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -0.11 Transmembrane 558 - 574 ( 558 - 574) 

Final Results 

20 bacterial membrane Certainty=0 . 1044 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

25 >GP:CAB11971 GB:Z99105 L-glutamine-D-fructose-6-phosphate 

amidotransferase [Bacillus subtilis] 
' Identities = 355/604 (58%), Positives = 445/604 (72%), Gaps = 4/604 (0%) 



30 


Query: 


1 


MCGIVGWGNTNATDILIQGLEKLEyRGYDSAGIFWGDNKSQLVKSVGRIAEIQAKVGD 


60 






MCGIVG +G +A +IL++GLEKLEYRGYDSAGI V + + K GRIA+++ V 






Sb j ct : 


1 


MCGIVGYIGQLDAKEILLKGLEKLEYRGYDSAGIAVANEQGIHVFKEKGRIADLREVVDA 


60 




Query: 


61 


SVSGTTGIGHTRWATHGKPTEGNAHPHTSGSGRFVLVHNGVIENYLQIKETYLTKHNLKG 


120 








+V GIGHTRWATHG+P+ NAHPH S GRF LVHNGVIENY+Q+K+ YL LK 




35 


Sbjct: 


61 


NVEAKAGIGHTRWATHGEPSYLNAHPHQSALGRFTLVHNGVIENYVQLKQEYLQDVELKS 


120 




Query: 


121 


ETDTEIAIHLVEHFVEEDNLSVLEAFKKALHIIEGSYAFALIDSQDADTIYVAKNKSPLL 


180 








+TDTE+ + ++E FV L EAF+K L +++GSYA AL D+ + +TI+VAKNKSPLL 




40 


Sb j ct : 


121 


DTDTEVWQVIEQFVN-GGLETEEAFRKTLTLLKGSYAIALFDNDNRETIFVAKNKSPLL 


179 




Query: 


181 


IGLGNGYNMVCSDAMAMIRETSEYMEIHDKELVIVKKDSVEVQDYDGNVIERGSYTAELD 


240 








+GLG+ +N+V SDAMAM++ T+EY+E+ DKE+VIV D V +++ DG+VI R SY AELD 






Sb j ct : 


180 


VGLGDTFNWASDAMAMLQVTNEYVELMDKEMVIVTDDQVVIKNLDGDVITRASYIAELD 


239 


45 


Query: 


241 


LSD IGKGTYPFYMLKEI DEQPTVMRKLISTYANESGDMNVDSD I I KSVQEADRLYI LAAG 


300 








SDI KGTYP YMLKE DEQP VMRK+I TY +E+G ++V DI +V EADR+YI+ G 






Sbjct: 


240 


ASDIEKGTYPHYMLKETDEQPWMRKIIQTYQDENGKLSVPGDIAAAVAEADRIYIIGCG 


299 




Query: 


301 


TSYHaGFAA.KTMIEKLTDTPVELGVSSEWGYNMPLriSKKPMFILLSQSGETADSRQVLVK 


360 


50 






TSYHAG K IE + PVE+ V+SE+ YNMPLLSKKP+FI LSQSGETADSR VLV+ 






Sbjct: 


300 


TSYHAGLVGKQYIEMWANVPVEVHVASEFSYNMPLLSKKPLFIFLSQSGETADSRAVLVQ 


359 




Query: 


361 


ANEMGIPSLTITWPGSTLSREATYTT^IHAGPEIAVASTKAYTAQVATLAFLAKAVGEA 


420 








+G +LTITNVPGSTLSREA YT+L+HAGPEIAVASTKAYTAQ+A LA LA + 




55 


Sbjct: 


360 


VKALGHKALTITNVPGSTLSREADYTLLIiHAGPEIAvASTKAYTAQIAVLAVIASVAADK 


419 



60 



Query: 421 NGKAEAKDFDLVHELSIVAQSIEATLSEKDVISEKVEQLLISTRNAFYIGRGNDYYVTME 480 

NG FDLV EL I A ++EA +KD + + L +RNAF+IGRG DY+V +E 

Sbjct: 420 NGINIG--FDLVKELGIAANAMEALCDQKDEMEMIAREYLTVSRNAFFIGRGLDYFVCVE 477 
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Query: 481 AALKLKEISYIQTEGFAAGELKHGTISLIEDNTPVIALISADSTIAAHTRGNIQEWSRG 540 

ALKLKEISYIQ EGFA GELKHGTI+L-IE TPV AL + + + RGN++EV +RG 
Sbjct: 478 GALKLKE I SYI QAEGFAGGELKHGT I ALIEQGTPVFALATQEH - VNLS IRGNVKEVAARG 536 

5 Query: 541 ANALIIVEEGLEREGDDIIVNKVHPFLSAISMVIPTQLIAYYASLQRGLDVDKPRNLAKA 600 

AN II +GL+ ' . D ++ +V+P L+ + V+P QLIAYYA+L RG DVDKPRNLAK+ 
Sbjct: 537 ANTCIISLKGLDDADDRFVLPEVNPAIAPLVSVVPLQLIAYYAALHRGCDVDKPRNLAKS 596 

Query: 601 VTVE 604 
10 VTVE 

Sbjct: 597 VTVE 600 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4079> which encodes the amino acid 
sequence <SEQ ID 4080>. Analysis of this protein sequence reveals the following: 

15 Possible site: 39 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.06 Transmembrane 558 - 574 ( 558 - 574) 

Final Results 

20 bacterial membrane Certainty=0 . 1426 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

25 >GP:CAB11971 GB:Z99105 L-glutamine-D-f ructose-6-phosphate 

amidotransferase [Bacillus subtilis] 
Identities = 353/604 (58%) , Positives = 445/604 (73%) , Gaps = 4/604 (0%) 

Query: 1 MCGIVGWGNRNATDILMQGLEKLEYRGYDSAGIFVANANQTNLIKSVGRIADLRAKIGI 60 
30 MCGIVG +G +A +IL++GLEKLEYRGYDSAGI VAN ++ K GRIADLR + 

Sbjct: 1 mcgivgyigqldakeillkglekleyr'gydsagiavaneqgihvfkekgriadlrevvda' 60 
Query: 61 dvagstgightrwathgqstednahphtsqtgrfvlvhngvienylhikteflaghdfkg 120 

+V GIGHTRWATHG+ + NAHPH S GRF LVHNGVIENY+ +K E+L + K 
35 Sbjct: 61 NVEAKAGIGHTRWATHGEPSYLNAHPHQSALGRFTLVHNGVIENYVQLKQEYLQDVELKS 120 

Query: 121 QTDTEIAVHLIGKFVEEDKLSVLEAFKKSLSIIEGSYAFALMDSQATDTIYVAKNKSPLL 180 

TDTE+ V +1 +FV L EAF+K+L++++GSYA AL D+ +TI+VAKNKSPLL 
Sbjct: 121 DTDTEVWQVIEQFVNGG-LETEEAFRKTLTLLKGSYAIALFDNDNRETIFVAKNKSPLL 179 

40 

Query: 181 IGLGEGYNMVCSDAMAMIRETSEFMEIHDKELVILTKDKVTVTDYDGKELIRDSYTAELD 240 

+GLG+ +N+V SDAMAM++ T+E++E+ DKE+VI+T D+V + + DG + R SY AELD 
Sbjct: 180 VGLGDTFNWASDAMAMLQVTNEYVELMDKEMVIVTDDQWIKNLDGDVITRASYIAELD 239 

45 Query: 241 LSDIGKGTYPFYMLKEIDEQPTVMRQLISTYADETGNVQVDPAIITSIQEADRLYILAAG 300 

SDI KGTYP YMLKE DEQP VMR++I TY DE G + V I ++ EADR+YI+ G 
Sbjct: 240 ASDIEKGTYPHYMLKETDEQPWMRKI IQTYQDENGKLSVPGDIAAAVAEADRIYI IGCG 299 

Query: 301 TSYHAGFATKNMLEQLTDTPVELGVASEWGYHMPLLSKKPMFILLSQSGETADSRQVLVK 360 
50 TSYHAG K +E + PVE+ VASE+ Y+MPLLSKKP+FI LSQSGETADSR VLV+ 

Sbjct: 300 TSYHAGLVGKQYIEMWANVPVEVHVASEFSYNMPLLSKKPLFIFLSQSGETADSRAVLVQ 359 

Query: 361 ANAMGIPSLTVTNVPGSTLSRFATYTMLIHAGPEIAVASTKAYTAQIAALAFLAKAVGEA 420 
A+G +LT+TNVPGSTLSREA YT+L+HAGPEIAVASTKAYTAQIA LA LA + 
55 Sbjct: 360 VKALGHKALTITNVPGSTLSREADYTLLLHAGPEIAVASTKAYTAQIAVLAVLASVAADK 419 

Query: 421 NGKQFjALDFNLVHELSLVAQSIEATLSEKDLVAEKVQALLATTRNAFYIGRGNDYYVAME 480 

NG + F+LV EL + A ++EA +KD + + L +RNAF+IGRG DY+V +E 
Sbjct: 420 NGIN--IGFDLVKELGIAANAMEALCDQKDEMEMIAREYLTVSRNAFFIGRGLDYFVCVE 477 



60 



Query: 481 AALKLKEISYIQCEGFARGELKHGTISLIEEDTPVIALISSSQLVASHTRGNIQEVAARG 540 

ALKLKEISYIQ EGFA GELKHGTI +LIE+ TPV AL + + S RGN++EVAARG 
Sbjct: 478 GALKLKEISYIQAEGFAGGELKHGTIALIEQGTPVFALATQEHVNLS - IRGNVKEVAARG 536 



65 Query: 541 AHVLTWEEGLDREGDDI IVNKVHPFLAP I AMVIPTQLIAYYASLQRGLDVDKPRNLAKA 600 
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A+ + +GLD D ++ +V+P LAP+ V+P QLIAYYA+L RG DVDKPRNLAK+ 
Sbjct: 537 ANTCIISLKGLDDADDRFVLPEVNPAIAPLVSWPLQLIAYYAALHRGCDVDKPRNLAKS 596 

Query: 601 VTVE 604 
VTVE 

Sbjct: 597 VTVE 600 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 500/604 (82%) , Positives = 552/604 (90%) 

Query: 1 MCGIVGWGNTNATDILIQGLEKLEYRGYDSAGIFWGDNKSQLVKSVGRIAEIQAKVGD 60 

MCGIVGWGN NATDIL+QGLEKLEYRGYDSAGIFV N++ L+ KSVGRIA+ + +AK+G 
Sbjct: 1 MCGIVGWGNRNATDILMQGLEKLEYRGYDSAGIFVANANQTNLIKSVGRIADLRAKIGI 60 

Query: 61 SVSGTTGIGHTRWATHGKPTEGNAHPHTSGSGRFVI.VHNGVIENYLQIKETYLTKHNLKG 120 

V+G+TGIGHTRWATHG+ TE NAHPHTS +GRFVLVHNGVI ENYL IK +L ■ H+ KG 
Sbjct: 61 DVAGSTGIGHTRWATHGQSTEDNAHPHTSQTGRFVLVHNGVIENYLHIKTEFLAGHDFKG 120 

Query: 121 ETDTEIAIHLVEHFVEEDNLSVLEAFKKALHIIEGSYAFALIDSQDADTIYVAKNKSPLL 180 

+TDTEIA+HL+ FVEED LSVLEAFKK+L IIEGSYAFAL+DSQ DTIYVAKNKSPLL 
Sbjct: 121 QTDTEIAVHLIGKFVEEDKLSVLEAFKKSLSIIEGSYAFALMDSQATDTIYVAKNKSPLL 180 

Query: 181 IGLGNGYNMVCSDAMAMIRETSEYMEIHDKELVIVKKDSVEVQDYDGNVIERGSYTAELD 240 

IGLG GYNMVCSDAMAMIRETSE+MEIHDKELVI + KD V V DYDG + R SYTAELD 
Sbjct: 181 IGLGEGYNMVCSDAMAMIRETSEFMEIHDKELVILTKDKVTVTDYDGKELIRDSYTAELD 240 

Query: 241 LSDIGKGTYPFYMLKEIDEQPTVMRKLISTYANESGDMNVDSDIIKSVQEADRLYILAAG 300 
LSDIGKGTYPFYMLKEIDEQPTVMR+LISTYA+E+G++ VD II S+QEADRLYILAAG 
, Sbjct: 241 LSDIGKGTYPFYMLKEIDEQPTVMRQLISTYADETGNVQVDPAIITSIQEADRLYILAAG 300 

Query: 301 TSYHAGFAAKTMIEKLTDTPVELGVSSEWGYlTOlPLLSKKPMFILLSQSGETADSRQvIiVK 360 

TSYHAGFA K M+E+LTDTPVELGV+SEWGY+MPLLSKKPMFILLSQSGETADSRQVLVK 
Sbjct: 301 TSYHAGFATKNMLEQLTDTPVELGVASEWGYHMPLLSKKPMFILLSQSGETADSRQVLVK 360 

Query: 361 ANEMGIPSLTITJWPGSTLSREATYTMLIHftGPEIAVASTKAYTAQVATLAFLAKRVGEA 420 

AN MGIPSLT+TNVPGSTLSREATYTMLIHAGPEIAVASTKAYTAQ+A LAFLAKAVGEA 
Sbjct: 361 ANAMGI PSLTVTNVPGSTLSREATYTML I HAGPE I AVASTKAYTAQI AALAFLAKAVGEA 420 

Query: 421 NGKAFAKDFDLVHELSIVAQSIEATLSEKDVISEKVEQLLISTRNAFYIGRGNDYYVTME 480 

NGK EA DF+LVHELS+VAQSIEATLSEKD+++EKV+ LL +TRNAFYIGRGNDYYV ME 
Sbjct: 421 NGKQEALDFNLVHELSLVAQSIEATLSEKDIiVAEKVQALLATTRNAFYIGRGNDYYVAME 480 

Query: 481 AALKLKEISYIQTEGFAAGELKHGTISLIEDNTPVIALISADSTIAAHTRGNIQEWSRG 540 

AALKLKEISYIQ EGFARGELKHGTISLIE++TPVIALIS+ +A+HTRGNIQEV +RG 
Sbjct: 481 AALKLKEISYIQCEGFAAGELKHGTISLIEEDTPVIALISSSQLVASHTRGNIQEVAARG 540 

Query: 541 ANALI IVEEGLEREGDDI IVNKVHPFLSAI SMVI PTQLiIAYYASLQRGLDVDKPRNLAKA 600 

A+ L +VEEGL+REGDDI IVNKVHPFL+ I+MVIPTQLIAYYASLQRGLDVDKPRNLAKA 
Sbjct: 541 AHVLTVVBEGLDREGDDIIVNKVHPFl^PIAWIPTQLIAYYASLQRGLDVDKPRNLAKA 600 

Query: 601 VTVE 604 
VTVE 

Sbjct: 601 VTVE 604 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1331 

A DNA sequence (GBSxl415) was identified in S.agalactiae <SEQ ID 4081> which encodes the amino 
acid sequence <SEQ ID 4082>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>>> Seems to have a cleavable N-term signal seg. 
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Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9797> which encodes amino acid sequence <SEQ ID 9798> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC44435 GB:U65000 type- I signal peptidase SpsB [Staphylococcus 
10 aureus] 

Identities = 62/185 (33%) , Positives = 97/185 (51%) , Gaps = 12/185 (6%) 

Query: 10 VKRDFIRNIILALIAVLILILLRYFVFATFKVHKDATNSYFSNGDVVVvN RNRTPK 65 

+K++ + II +A +1L ++ F+ + + ++ + +G+ V VN + + 
15 Sbjct: 1 MKKELLEWIISIAVAFVILFIVGKFIVTPYTIKGESMDPTLKDGERVAVNIIGYKTGGLE 60 

Query: 66 YKDFIWKVGKIF-YISRVIGEPNQKVRVMDDILYLNDVFKDEPYIEKMKNAYSEKKDGQ 124 

+ +V+ K Y+ RVIG P KV +D LY+N +DEPY+ N + K G 
Sbjct: 61 KGNVWFHANKNDDYVKRVIGVPGDKVEYKNDTLYVNGKKQDEPYL NYNLKHKQGD 116 

20 

Query: 125 MPFTSDFSVETL- -TRNKESRVPKGSYLVLNDNRQNKNDSRKFGLIKEKDIRGVITFKvY 182 

T F V+ L K + +PKG YLVL DNR+ DSR FGLI E I G ++F+ + 

Sbjct: 117 Y-ITGTFQVKDLPNANPKSNVIPKGKYLVLGDNREVSKDSRAFGLIDEDQIVGKVSFRFW 175 

25 Query: 183 PLSEF 187 

P SEF 

Sbjct: 176 PFSEF 180 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4083> which encodes the amino acid 
30 sequence <SEQ ID 4084>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>» Seems to have an uncleavable N-term signal seq ' 

INTEGRAL Likelihood =-14.22 Transmembrane 10 - 26 ( 4 - 34) 

35 Final Results 

bacterial membrane Certainty=0. 6689 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 An alignment of the GAS and GBS proteins is shown below. 

Identities = 99/185 (53%) , Positives = 130/185 (69%) 

Query: 9 MVKlDFIRNIIl^IAVLILILLRYWFATFKVHKDATNSYFSNGDVVvVNRNRTPKYKD 68 
MVKRDFIRNI+L LI ++ ILLR FVF+TFKV + N+Y +GD+V + +N PKYKD 
45 Sbjct: 1 MVKRDFIRNILLLLIVIIGAILLRIFVFSTFKVSPETANTYLKSGDLVTIKKNIQPKYKD 60 

Query: 69 FIWKVGKIFYISRVIGEPNQKVRVMDDILYIOTVFKDEPYIEKMKNAYSEKKDGQMPFT 128 

F+VY+VGK Y+SRVI V MDDI YLN++ + + Y+EKMK Y +T 

Sbjct: 61 FWYRVGKKDYVSRVIAVEGDSVTYMDDIFYLNNMVESQAY 120 

50 

Query: 129 SDFSVETLTRNKESRVPKGSYLVLNDNRQNKNDSRKFGLIKEKDIRGVITFKVYPLSEFG 188 

DF+V T+T +K +VPKG YL+LNDNR+N NDSR+FGLI I+G++TF+V PLS+FG 
Sbjct: 121 DDFTVATITADKYQK^PKGKYLLLNDNRKNTNDSRRFGLINASQIKGLVTFRVLPLSDFG 180 

55 Query: 189 FTASE 193 

F E 

Sbjct: 181 FVEVE 185 



A related GBS gene <SEQ ID 8789> and protein <SEQ ID 8790> were also identified. Analysis of this 
60 protein sequence reveals the following: 
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Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: 10.13 
GvH: Signal Score (-7.5): 0.45 
Possible site: 37 
5 »> Seems to have a cleavable N-term signal seq. 

ALOM program count: 0 value: 3.82 threshold: 0.0 
PERIPHERAL Likelihood =3.82 69 
modified ALOM score: -1.26 

10 *** Reasoning Step: 3 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

36.0/59.9% over 165aa 

Bacillus caldolyticus 
20 EGAD|24914| signal peptidase i Insert characterized 

ORF00169(364 - 867 of 1179) 

EGAD| 24914 | 25718 (15 - 180 of 182) signal peptidase i {Bacillus caldolyticus} 
%Match =11.9 
25 %Identity =35.9 %Similarity = 59.9 

Matches = 60 Mismatches = 61 Conservative Sub.s = 40 

312 342 372 402 432 462 483 510 

L*KHDIMEKRLGWMVKRDFIRNI ILALIAVLILILLRYFVFATFKVHKDATNSYFSNGDVVVVNR - - -NRTPKYK-DFI 

30 , 1 :: ::|| :: || |||: : | : : = I : I I 

VTKQKEKRGRRWPWFVAVCWATLRLFVFSNyWEGKSMMPTLESGNLLIVNKLSYDIGPIRRFDII 
10 20 30 40 50 60 

537 567 597 627 657 687 717 747 

35 VYKVGKIF-YISRVIGEPNQKVRVMDDILYLNDVFKDEPYIEKMKNAYSEKKDGQMPFTSDFSVETLTRNKESRVPKGSY 

1= I |= UK I :: :||||:| I I I I I I I ; : I I I = I •' I : : I I I I 

VFHANKKEDYVKRVIGLPGDRIAYKNDILYVNGKKVDEPYLRPYKQ- - -KLLDGRL- - TGDFTLEEVT - -GKTRVPPGCI 
80 90 100 110 120 130 140 

40 777 807 837 867 897 927 957 987 

LVIOTNRQNKNDSRKFGLIKEKDIRGVITFKVYPLSEFGFTASE**KNGII*YHSFYVIKWLRNIFF*DR*NF**RXXN* 



45 



FVLGDNRLSSWDSRHFGFVXINQIVGKVDFRYWPFKQFAFQF 
150 160 170 180 



SEQ ID 8790 (GBS7) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 1 (lane 4; MW 46kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 2 (lane 4; MW 21kDa). The GBS7-His fusion 
product was purified (Figure 1 89, lane 6) and used to immunise mice. The resulting antiserum was used for 
50 FACS (Figure 262), which confirmed that the protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1332 

A DNA sequence (GBSxl416) was identified in S.agalactiae <SEQ ID 4085> which encodes the amino 
55 acid sequence <SEQ ID 4086>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

»> Seems to have no N-terminal signal sequence 
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Final Results 



bacterial cytoplasm 
bacterial membrane 
bacterial outside 



Certainty=0. 10 99 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9795> which encodes amino acid sequence <SEQ ID 9796> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF25804 GB:AF172173 pyruvate kinase [Streptococcus thermophilus] 
Identities = 413/500 (82%) , Positives = 451/500 (89%) 



Query: 


1 


MNICRVKIVATLGPAVEFRGGKKFGESGYWGESLDVEASAEKIAQLIKEGANVFRFNFSHG 


60 






MNKRVKIVATLGPAVE RGGKKFGE GYW E LD +ASA+ IAQLI +EGANVFRFNFSHG 




Sb j ct : 


1 


MNKRVKIVATLGPAVEIRGGKKFGEDGYWSEKLDPDASAKNIAQLIEEGANVFRFNFSHG 


60 


Query: 


61 


DHAEQGARMATVRKAEEIAGQKVGFLLDTKGPEIRTELFEDGADFHSYTTGTKLRVATKQ 


120 






+HAEQG RM VR AE IAGQKVGFLLDTKGPEIRTELFE A ++Y TG ++R+ATKQ 




Sbjct: 


61 


NHAEQGERMDWRMAESIAGQKVGFLLDTKGPEIRTELFEGDAKEYAYKTGEQIRIATKQ 


120 


Query: 


121 


GIKSTPEVIALNVAGGLDIFDDVEVGKQILVDDGKLGLTVFAKDKDTREFEVVVENDGLI 


180 






G+KST +VIALNVAG LDI FDDVEVGKQ+L VDDGKLGL V KD + REF V VENDG+I 




Sb j ct : 


121 


GLKSTRDVIALNVAGALD I FDDVEVGKQVLVDDGKLGLR WDKDAEKREF I VEVENDGI I 


180 


Query: 


181 


GKQKGVNIPYTKIPFPALAERDNADIRFGLEQGIiNFIAISFvRTAKDvNEvRAICEETGN 


240 






KQKGVNIPYTKIPFPALAERDNADIRFGLEQG+NFIAISFVRTAKDV EVRAI CEETGN 




Sbjct: 


181 


AKQKGVNIPYTKIPFPALAERDNADIRFGLEQGINFIAISFVRTAKDVQEVRAICEETGN 


240 


Query: 


241 


GHVKLFAKIENQQGIDNIDEIIEAADGIMIARGDMGIEVPFEMVPVYQKMIITKVNAAGK 


300 






GHVKL AKIENQQGIDNIDEIIEAADGIMIARGDMGIEVPFEMVPVYQKMIITKVNAAGK 




Sb j ct : 


241 


GHVKLIAKIENQQGIDNIDEIIEAAIXSIMIARSDMGIEVPFEMvPVYQKMIITKVNAAGK 


300 


Query: 


301 


AVITATNMLETMTDKPRATRSEVSDVTT^VIDGTI^TMLSGESANGKYPVESVRTMATID 


360 






V+TATNMLETMT+KPRATRSEVSDVFNAVTDGTDATMLSGESANG YPVESVRTMATI 




Sb j ct : 


301 


IWTATNMLETMTEKPRATRSEVSDVFNAVIDGTDATMLSGESANGPYPVESVRTMATIH 


360 


Query: 


361 


KNAQTLLNEYGRLDSSAFPRNNKTDVIASAVKDATHSMDI KLWTITETGNTARAI SKFR 


420 






KNAQTLL EYGRL+SS F R++ T+V+ASAVKDAT+SM I+L+V +TE+GNTA I +R 




Sbjct: 


361 


KNAQTLLKEYGRLNSSTFDRSSNTEWASAVKDATNSMHIQLIVALTESGNTASLIDTYR 


420 


Query: 


421 


PDADIIAVTFDEKVQRSLMINWGVIPVLADKPASTDDMFEVAERVALEAGFvESGDNIVI 


480 






P+ADI A+TFDE Q+SLM+NWGVIPV+ + P+STDDMFEVAERVALE+G VESGDNIVI 




Sb j ct : 


421 


PFADIWAITFDELTQKSLMLNWGVIPVVTETPSSTDDMFEVAERVALESGLVESGDNIVI 


480 


Query: 


481 


VAGVPVGTGGTOTMRVRTVK 500 








VAGVPVG+G TNTMR+RTVK 




Sb j ct : 


481 


VAGVPVGSGNTNTMRIRTVK 500 





A related DNA sequence was identified in S.pyogenes <SEQ ID 4087> which encodes the amino acid 
sequence <SEQ ID 4088>. Analysis of this protein sequence reveals the following: 

Possible site: 54 
>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0915 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

RGD motif: 272-274 

The protein has homology with the following sequences in the databases: 

>GP:AAF25804 GB:AF172173 pyruvate kinase [Streptococcus thermophilus] 
Identities = 404/500 (80%) , Positives = 457/500 (90%) 
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Query: 1 MNKRVKIVATLGPAVEIRGGKKyGEDGYWAGQLDVEESAKKIAELIEAGANVFRENFSHG 60 

MNKRVKIVATLGPAVEIRGGKK+GEDGYW+ +LD + SAK IA+LIE GANVFRFNFSHG 
Sbjct: 1 MNKRVKIVATLGPAVEIRGGKKFGEDGYWSEKIJDPnASAKNIAQLIEEGANVFRFNFSHG 60 

5 

Query: 61 DHKEO^DRMATTOLAEEIARQKVGFLLDTKGPEMRTELFADDAKEFSYVTGEKIRVATTQ 120 

+H EQG+RM VR+AE IA QKVGFLLDTKGPE+RTELF DAKE++Y TGE+IR+AT Q 
Sbjct: 61 NHAEQGERMDWRMAESIAGQKVGFLLDTKGPEIRTELFEGDAKEYAYKTGEQIRIATKQ 120 

10 Query: 121 GIQSTRDVIALNVAGSLDIYDEVEVGHTILIDDGKLGLKVIDKDIATRQFIVEVEtTOGII 180 

G++STRDVIALNVAG+LDI+D+VEVG +L+DDGKLGL+V+DKD R+FI VEVENDGI I 
Sbjct: 121 GLKSTRDVIALWAGALDIFDDVEVGKQVLVDDGKLGLRWDKDAEKREFIVEVENDGII 180 

Query: 181 AKQKGVNIPOTKIPFPALAERDNADIRFGLEQGLNFIAISFVRTAKDVEEVREICRETGN 240 
15 AKQKGVNIP TKI PFPALAERDNAD IRFGLEQG+NFIAI S FVRTAKDV+EVR IC ETGN 

Sbjct: 181 AKQKGVNIPYTKIPFPALAERDNADIRFGLEQGINF1AISFVRTAKDVQEVRAICEETGN 240 

Query: 241 DHVQLFAKIENQQGIDNLDEIIEAADGIMIARGDMGIEVPFEMVPVFQKMIITKVHAAGK 300 
HV+L AKIENQQGIDN+DEIIEAADGIMIARGDMGIEVPFEMVPV+QKMIITKVNAAGK 
20 Sbjct: 241 GHVKLLAKIENQQGIDNIDEIIEAADGIMIARGDMGIEVPFEMVPVYQKMIITKVNAAGK 300 

Query: 301 AVITATNMLETMTEKPRATRSEV2DVFNAVIDGTDATMLSGESANGKYPVESVRTMATID 360 

V+TATNMLETMTEKPRATRSEVSDVFNAVIDGTDATMLSGESANG YPVESVRTMATI 
Sbjct: 301 IWTATNMLETMTEKPRATRSEVSDVFNAVIDGTDATMLSGESANGPYPVESVRTMATIH 360 

25 

Query: 361 RNAQTLLNEYGRLDSSAFPRTNKTDVIASAVKDATHSMDIKL\7VTITETGNTARAISKFR 420 

+NAQTLL EYGRL+SS F R++ T+V+ASAVKDAT+SM I+L+V +TE+GNTA I +R 
Sbjct: 361 KNAQTLLKEYGRLNSSTFDRSSNTEWASAVKDATNSMHIQLIVALTESGNTASLIDTYR 420 

30 Query: 421 PDAD1IAVTFDEKVQRALMINWGVIPVLAEKPASTDDMFEVAERVAVEAGLVQSGDNIVI 480 

P+ADI A+TFDE Q++LM+NWGVIPV+ E P+STDDMFEVAERVA+E+GLV+SGDNIVI 
Sbjct: 421 PEADIWAITFDELTQKSLMLNWGVIPWTETPSSTDDMFEVAERVALESGLVESGDNIVI 480 

Query: 481 VAGVPVGTGGTNTMRVRTVK 500 
35 VAGVPVG+G TNTMR+RTVK 

Sbjct: 481 VAGVPVGSGNTNTMRIRTVK 500 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 440/500 (88%) , Positives = 462/500 (92%) 

40 

Query: 1 MNKRVKIVATLGPAVEFRGGKKFGESGYWGESLDVEASAEKIAQLIKEGANVFRFNFSHG 60 

MNKRVKIVATTiGPAvE RGGKK+GE GYW LDVE SA+KIA+LI+ GANVFRFNFSHG 
Sbjct: 1 MNKRVKIVATLGPAVEIRGGKKYGEDGYWAGQLDVEESAKKIAELIEAGANVFRFNFSHG 60 

45 Query: 61 DHAEQGARMATVRKAEEIAGQKVGFLLDTKGPEIRTELFEDGADFHSYTTGTKLRVATKQ 120 

DH EQG RMATVR AEEIA QKVGFLLDTKGPE+RTELF DA SY TG K+RVAT Q 
Sbjct: 61 DHKEQGDRMATVRLAEEIARQKVGFLLDTKGPEMRTELFADDAKEFSYVTGEKIRVATTQ 120 

Query: 121 GI KSTPEVIALNVAGGLD I FDDVEVGKQILVDDGKLGLTVFAKDKDTREFE WVENDGL I 180 
50 GI+ST +VIALNVAG LDI+D+VEVG IL+DDGKLGL V KD TR+F V VENDG+I 

Sbjct: 121 GIQSTRDVIALNVAGSLDIYDEVEVGHTILIDDGKLGLKVIDKDIATRQFIVEVENDGII 180 

Query: 181 GKQKGVNI PYTKI PFPALAERDNAD I RFGLEQGLNFI AI SFVRTAKDVNEVRAI CEETGN 240 
KQKGVNIP TKI PFPALAERDNADIRFGLEQGLNFIAI SFVRTAKDV EVR IC ETGN 
55 Sbjct: 181 AKQKGVNIPNTKIPFPALAERDNADIRFGLEQGLNFIAISFVRTAKDVEEVREICRETGN 240 

Query: 241 GHVKLFAKIENQQGIDNIDEI IEAADGIMIARGDMGIEVPFEMVPVYQKMI ITKVNAAGK 300 

HV+LFAKIENQQGIDN+DEIIEAADGIMIARGDMGIEVPFEMVPV+QKMIITKVNAAGK 
Sbjct: 241 DHVQLFAKIENQQGIDNLDEI IEAADGIMIARGDMGIEVPFEMVPVFQKMI ITKVNAAGK 300 

60 

Query: 301 AVITATNMLETMTDKPRATRSEVSDVFNAVIDGTDATMLSGESANGKYPVESVRTMATID 360 

AVITATNMLETMT+KPRATRSEVSDVFNAVIDGTDATMLSGESANGKYPVESVRTMATID 
Sbjct: 301 AVITATNMLETMTEKPRATRSEVSDVFNAVIDGTDATMLSGESANGKYPVESVRTMATID 360 

65 Query: 361 KNAQTLLNEYGRLDSSAFPRNNKTDVIASAVKraTHSMDIKLVVTITETGNTARAISKFR 420 

+NAQTLLNEYGRLDSSAFPR NKTDVIASAVKDATHSMDIKLWTITETGNTARAISKFR 
Sbjct: 361 RNAQTLLNEYGRLDSSAFPRTNKTDVIASAVKDATHSMDIKLVVTITETGNTARAISKFR 420 
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Query: 421 PDADILAVTFDEKVQRSLMINWGVIPVIADKRaSTDDMFEVTffiRVALEAGFVESGDNIVI 480 

PDADILAVTFDEKVQR+LMINWGVI PVLA+KPASTDDMFEVAERVA+EAG V+SGDNIVI 
Sbjct: 421 PDADILAVTFDEKVQRALMINWGVIPVLaEKPASTDDMFEVAERVAVEAGLVQSGDNIVI 480 

5 

Query: 481 VAGVPVGTGGTNTMRVRTVK 500 

VAGVPVGTGGTNTMRVRTVK 
Sbjct: 481 VAGVPVGTGGTNTMRVRTVK 500 

10 A related GBS gene <SEQ ID 8791> and protein <SEQ ID 8792> were also identified. Analysis of this 
protein sequence reveals the following: 

Belongs to Glycolysis/gluconeogenesis pathway. Proteins belonging to this methabolic 
pathway have been experimentally detected on the surface of Streptococci. 

The protein has homology with the following sequences in the databases: 

15 >GP] 6708108 |gb|AAF25804.l|AF172173_2|AF172173 pyruvate kinase 

{Streptococcus thermophilus} 

Score = 821 bits (2098), Expect =0.0 
Identities = 412/500 (82%) , Positives = 450/500 (89%) 

Query: 1 MNKRVKIVATLGPAVEFRGGKKFGESGYWGESLDVEASAEKIAQLIKEGANVFRFNFSHG 60 

MNKRVKIVATLGPAVE RGGKKFGE GYW E LD +ASA+ IAQLI +EGANVFRFNFSHG 
Sbjct: 1 MNKRVKIVATLGPAVEIRGGKKFGEDGYWSEKLDPDASAKNIAQLIEEGANVFRFNFSHG 60 

25 Query: 61 DHAEQGARMATVRKAEEIAGQKVGFLLDTKGPEIRTELFEDGADFHSYTTGTKLRVATKQ 120 

+HAEQG RM VR AE IAGQKVGFLLDTKGPEIRTELFE A ++Y TG ++R+ATKQ 
Sbjct: 61 NHAEQGERMDWRMAESIAGQKVGFLLDTKGPEIRTELFEGDAKEYAYKTGEQIRIATKQ 120 

Query: 121 GI KSTPEVIALNVAGGLDI FDDVEVGKQILVDDGKLGLTOTAKDKDTREFEVvVENDGLI 180 
30 G+KST +VIALNVAG LDIFDDVEVGKQ+LVDDGKLGL V KD + REF V VENDG+I 

"Sbjct: 121 GLKSTRDVIALNVAGALDI FDDVEVGKQVIiVDDGKIiGLRVVDKDAEKREFIVEVENDGI I 180 

Query: 181 GKQKGWIPYTKIPFPALAERDNADIRFGLEQGLNFIAISFVRTAKDVNEVRAICEETGX 240 
KQKGVNI PYTKI PFPALAERDNAD I RFGLEQG+NFI AI S FVRTAKDV EVRAICEETG 
35 Sbjct: 181 AKQKGVNI PYTKI PFPALAERDNAD.I RFGLECGINFIAI SFVRTAKDVQEVRAI CEETGN 240 

Query: 241 GHVKLFAKIENQQGIDNIDEI IFAADGIMIARGDMGIEVPFEMVPVYQKMI ITKVNAAGK 300 

GHVKL AKIENQQGIDNIDEIIEAADGIMIARGDMGIEVPFEMVPVYQKMIITKVNAAGK 
Sbjct: 241 GHVKLLAKIENQQGIDNIDEIIEAADGIMIARGDMGIEVPFEMVPVYQKMI ITKVNAAGK 300 

40 

Query: 301 AVITATNMLETMTDKPRATRSEVSDVFNAVIDGTDATMLSGESANGKYPVESVRTMATID 360 

V+TATNMLETMT+KPRATRSEVSDVFNAVIDGTDATMLSGESANG YPVESVRTMATI 
Sbjct: 301 IWTATNMLETMTEKPRATRSEVSDVFNAVIDGTDATMLSGESANGPYPVESVRTMATIH 360 

45 Query: 361 KNAQTLLNEYGRLDSSAFPRNNKTDVIASAVKDATHSMDIKLVVTITETGNTARAISKFR 420 

KNAQTLL EYGRL+SS F R++ T+V+ASAVKDAT+SM I+L+V +TE+GNTA I +R 
Sbjct: 361 KNAQTLLKEYGRLNSSTFDRSSNTEWASAVKDATNSMHIQLIVALTESGNTASLIDTYR 420 

Query: 421 PDADILAWFDEKVQRSLMINWGVIPVIiADKPASTDDMFEVAERVALEAGFVESGDNIVI 480 
50 P+ADI A+TFDE Q+SLM+NWGVI PV+ + P+STDDMFEVAERVALE+G VESGDNIVI 

Sbjct: 421 PEADIWAITFDELTQKSLMLNWGVIPWTETPSSTDDMFEVAERVALESGLVESGDNIVI 480 

Query: 481 VAGVPVGTGGTNTMRVRTVK 500 
VAGVPVG+G TNTMR+RTVK 
55 Sbjct: 481 VAGVPVGSGNTNTMRIRTVK 500 



SEQ ID 8792 (GBS330) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 73 (lane 5; MW 59kDa). 

GBS330-His was purified as shown in Figure 213, lane 6. 

60 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1333 

A DNA sequence (GBSxl417) was identified in S.agalactiae <SEQ ID 4089> which encodes the amino 
acid sequence <SEQ ID 4090>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0632 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF25803 GB:AF172173 phosphof ructokinase [Streptococcus thermophilus] 
Identities = 270/337 (80%) , Positives = 302/337 (89%) , Gaps = 1/337 (0%) 

Query: 1 MKRIAVLTSGGDAPGMNAAIRAWRKAI SEGMEVYGINQGYYGMVTGDI FPLDANS VGDT 60 

MKRIAVLTSGGDAPGMNAA+RAW KAISEG+EV+GIN+GY GMV GDIF LDA V + 
Sbjct: 1 MKRIAVLTSGGDAPGMNAAVRAWLKAI SEGIEVFGINRGYAGMVEGDI FKLDAKRVENI 60 

Query: 61 INRGGTFLRSARYPEFAELEGQLKGIEQLKKHGIEGVWIGGDGSYHGAMRLTEHGFPAV 120 

++RGGTFL+SARYPEFA+LEGQLKGIEQLKK+GIEGVWIGGDGSYHGAMRLTEHGFPAV 
Sbjct: 61 LSRGGTFLQSARYPEFAQLEGQLKGIEQLKKYGIEGVWIGGDGSYHGAMRLTEHGFPAV 120 

Query: 121 GIjPGTIDNDIVGTDYTIGFDTAVATAVENLDRLRDTSASHNRTFWEVMGRNAGDIALWS 180 

GLPGTIDNDIVGTDYTIGFDTAVATA E LD+++DT+ SH RTFWEVMGRNAGDIALW+ 
Sbjct: 121 GLPGTIDNDIVGTDYTIGFDTAVATATEALDKrQDTAFSHGRTFWEVMGRNAGDIALWA 180 

Query: 181 GIAAGADQIIVPEEEFNIDEWSNVRAGYAAG-KHHQIIVLAEGVMSGDEFAKTMKAAGD 239 

GIA+GADQIIVPEEE++I+EW V+ GY +G K H IIVLAEGVM +EFA MK AGD 
Sbjct: 181 GIASGADQIIVPEEEYDINEVTOKVKEGYESGEKSHHIIVLftEGvMGAEEFAAKMKEAGD 240 

Query: 240 DSDLRVTNLGHLLRG<3SPTARDRVLASRM(aYAVQLLKE 299 

SDLR TNLGH+ + RGGS PTARDRVLAS MGA+AV LLKEG GG+AVG+HNE++VESPILG 
Sbjct: 241 TSDLRATNLGHVIRGGSPTARDRVLASWMGAHAVDLLKEGIGGVAVGIHNEQLVESPILG 300 

Query: 300 LAEEGALFSLTDEGKIVVNNPHKADLRIAALNRDLAN 33 6 

AEEGALFSLT++GKT+VNNPHKA L A LNR LAN 
Sbjct: 301 TAEEGALFSLTEDGKIIVNNPHKARLDFAELNRSLAN 337 

Proteins in the glycolysis/gluconeogenesis pathway have been experimentally detected on the surface of 
Streptococci. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 409 1> which encodes the amino acid 
sequence <SEQ ID 4092>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0632 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 274/336 (81%) , Positives = 306/336 (90%) , Gaps = 1/336 (0%) 

Query: 1 MKRIAVLTSGGDAPGI^AAIRAVVRKAISEGMEVYGINQGYYGMVTGDIFPLDANSVGDT 60 

MKRIAVLTSGGDAPGMNAAIRAVVRKAISEGMEvYGIN+GY GMV GDIFPL + VGD 
Sbjct: 1 MKRIAVIjTSGGDAPGMNAAIRAVVRKAISEGMEVYGINRGYAGMVDGDIFPLGSKEVGDK 60 

Query: 61 INRGGTFLRSARYPEFAELEGQLKGIEQLKKHGIEGVWIGGDGSYHGAMRLTEHGFPAV 120 
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I+RGGTFL SARYPEFA+LEGQL GIEQLKKHGIEGVWIGGDGSYHGAMRLTEHGFPAV 
Sbjct: 61 ISRGGTFLYSARYPEFAQLEGQLAGIEQLKKHGIEGWVIGGDGSyHGAMRLTEHGFPAV 120 

Query: 121 GLPGTIDNDIVGTDYTIGFDTAVATAVENLDRLRDTSASHNRTFVVEVMGRMAGDIALWS 180 
5 G+PGTIDNDI GTDYTIGFDTAV TAVE +D+LRDTS+SH RTFWEVMGRNAGDIALW+ 

Sbjct: 121 GIPGTIDNDIAGTDYTIGFDTAVNTAVEAIDKLRDTSSSHGRTFWEVMGRNAGDIALWA 180 

Query: 181 GIAAGADQI IVPEEEFNIDEWSNVRAGYA-AGKHHQI IVLAEGVMSGDEFAKTMKAAGD 239 
GIA+GADQIIVPEEEF+1++V S ++ + GK+H IIVLAEGVMSG+ FA+ +K AGD 
10 Sbjct: 181 GIASGADQIIVPEEEFDIEKVASTIQYDFEHKGKNHHIIVIiAEGVMSGEAFAQKLKEAGD 240 

Query: 240 DSDLRVTMLGHLLRGGSPTARDRVLASRMGAYAVQLLKEGRGGLAVGVHNEEMVESPILG 299 

SDLRVTNLGH+LRGGSPTARDRV+AS MG++AV+LLK+G+GGLAVG+HNEE+VESPILG 
Sbjct: 241 KSDLRVTNLGHILRGGSPTARDRVIASWMGSHAVELLKDGKGGLAVGIHNEELVESPILG 300 



15 



Query: 300 IAEEGALFSLTDEGKIVVNNPHKADLRLAALNRDtA 335 

AEEGALFSLT+EGKI +VNNPHKA L AAMR L+ 
Sbjct: 301 TAEEGALFSLTEEGKI IVMNPHKARLDFAALNRSLS 336 



20 SEQ ID 4090 (GBS313) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 45 (lane 5; MW 41kDa). 

GBS313-His was purified as shown in Figure 204, lane 4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

25 Example 1334 

A DNA sequence (GBSxl418) was identified in S.agalactiae <SEQ ID 4093> which encodes the amino 
acid sequence <SEQ ID 4094>. This protein is predicted to be DNA polymerase III alpha subunit (dnaE). 
Analysis of this protein sequence reveals the following: 

Possible site: 55 
30 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1446 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

There is also homology to SEQ ID 4096. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

40 Example 1335 

A DNA sequence (GBSxl419) was identified in S.agalactiae <SEQ ID 4097> which encodes the amino 
acid sequence <SEQ ID 4098>. This protein is predicted to be YHCF (farR). Analysis of this protein 
sequence reveals the following: 

Possible site: 52 
45 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3 3 16 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 
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>GP:BAB04102 GB:AP001508 transcriptional regulator (GntR family) 
[Bacillus halodurans] 
Identities = 51/116 (43%) , Positives = 79/116 (67%) 

5 Query: 5 FNEKSPIYSQIAEHIKMQIVSQE1KSGDQLPT\7RELAQEAGVNPNTMQRAFTELEREGMV 64 

F+ PIY Q+AE +K QIV E++ G++LP+VR++ EA VNPNT+QR + ELE +V 
Sbjct: 5 FHSSEPIYLQIAERVKRQIWGELRIfiEKLPSVRDMGIEANVNPNTVQRTYRELEGLKIV 64 

Query: 65 FSQRTSGRFVTEDNLLIGKIRQQVAKAEIATFVMNMKKIGYKLDEITVALDHFIKE 120 
10 S+R G FVTED ++ IR+Q+ + E++ FV M+++GY +EI L+ ++ E 

Sbjct: 65 ESKRGQGTFVTEDEQVLQAIREQMKETEISHFVQGMREMGYSDNEIQAGLESYLTE 120 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4099> which encodes the amino acid 
sequence <SEQ ID 4100>. Analysis of this protein sequence reveals the following: 

15 Possible site: 25 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2075 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 80/120 (66%), Positives = 100/120 (82%) 

25 

Query: 1 MAWEFNEKSPIYSQIAEHIKMQIVSQEIKSGDQLPTVRELAQEAGVNPNTMQRAFTELER 60 

M+W+F EKSPIY+QIA+H+ MQI+SQEIKSGDQLPTVRE A+ AGVNPNTMQRAFTELER 
Sbjct: 1 MSWKFEEKSPIYAQIAQHVMMQIISQEIKSGDQLPTVREYAEIAGVNPNTMQRAFTELER 60 

30 Query: 61 EGMVFSQRTSGRFvTEDNLLIGKIRQQVAKAEIATFvNNMKKIGYKLDEITOALDHF I KE 120 

EGMV+SQRT+GRFVT+D LI + R+++A +EL +F+ NM K+G+ EI L F+KE 
Sbjct: 61 EGMVYSQRTAGRFVTDDQKLIARKRRELAISKLESFITNMTKMGFSHTEIIPVLTSFLKE 120 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
35 vaccines or diagnostics. 

Example 1336 

A DNA sequence (GBSxl420) was identified in S.agalactiae <SEQ ID 4101> which encodes the amino 
acid sequence <SEQ ID 4102>. This protein is predicted to be ABC transporter, ATP-binding protein 
(yhcG). Analysis of this protein sequence reveals the following: 

40 Possible site: 26 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2757 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12735 GB:Z99108 similar to glycine betaine/L-proline 
50 transport [Bacillus subtilis] 

Identities = 87/228 (38%) , Positives = 150/228 (65%) , Gaps = 1/228 (0%) 

Query: 5 LQLHHVTKKYHKHTAVNDVTVSIPTGKIIGLLGPNGSGKTTIIKMINGLLQPDKGDIVID 64 
++L HV+KKY +HTAVNDV++++ +G+I GL+GPNGSGK+T +KM+ GLL P G + +D 
55 Sbjct: 3 IKLEHVSKKYGRHTAVNDVSITLSSGRIYGLIGPNGSGKSTTLroTOGLLFPTSGFVKVD 62 



Query: 65 GYRPSVETKKIISYLPDTSYLQENMKIKDVVTLFEDFYNDFDSKVAYQLFEDLNLNPRER 124 
+ + E + +YL + + +KD+V ++ + DF ++ Y+L ++ LNP ++ 
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Sbjct: 63 EEQOTREMWQTAYLTEI.DMFYPHFTVKI)^WHFYQSQFPDFHTEQWKLLNEMQLNPEKK 122 

Query: 125 LKNLSKGNKEKVQLILVMSRKARLYILDEPIGGVDPAftRDYILKTIISNYSNDAS-VLIS 183 

+K LSKGN+ +++++L ++R+A + +LDEP G+DP RD 1+ +++S + V+I + 
Sbjct: 123 IKKLSKGNRGRLKI VIAIARRADVILLDEPFSGLDPMVRDSITOSLVSYIDFEQQIWIA 182 

Query: 184 THL1SDIEPILDEVIFLKEGEIDLQGNADDLREEHNCSIDALFRERFK 231 

TH I +IE +LDEVI L GE Q +D+RE+ S+ F+ + + 
Sbjct: 183 THEIDEIETLLDEVIILANGEKVAQREVEDIREQEGMSVLQWFKSKME 230 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4103> which encodes the amino acid 
sequence <SEQ ID 4104>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>>> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1983 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 171/231 (74%) , Positives = 200/231 (86%) 

Query: 1 MTQLLQLHHVTKKYHKHTAVNDVTVSIPTGKIIGLLGPNGSGKTTIIKMINGLLQPDKGD 60 
25 , ' M LLQLHHV+K Y + A++D+T++IP GKIIGLLGPNGSGKTT+IK+INGLLQP+KG+ 

Sbjct: 1 MAHLLQLHHVSKSYREKKAIDDLTITIPNGKIIGLLGPNGSGKTTLIKLINGLLQPNKGE 60 

Query: 61 IVIDGYRPSVETKKI I SYLPDTSYLQENMKIKDWTLFEDFYNDFDSKVAYQLFEDLNLN 120 
IVIDGYRP VETKKI ISYLPDT+YL ENM+IKD++ F DFY+DFD A L DL L+ 
30 Sbjct: 61 IVIDGYRPCVETKKIISYLPDTTYLNENMRIKDMLEFFSDFYSDFDKSKATSLLRDLELD 120 

Query: 121 PRERLKNLSKOTKEKVQLILvMSRKARLYILDEPIGGVDPAARDYILKTIISNYSNDASV 180 

P +R K LSKGNKEKVQLILVMSRKARLY+LDEPIGGVDPAARDYILKTII++Y +ASV 
Sbjct: 121 PEDRFKTLSKGNKEKVQLILVMSRKARLYVLDEPIGGVDPAARDYILKTIINSYCENASV 180 

35 

Query: 181 L I STHLI SD I EP I LDEVI FLKEGE I DLQGNRDDLREEHNCS IDALFRERFK 231 

+ISTHLISDIEPILDEVIFLK+G + L GNADDLR+E+ SID+LFRE +K 
Sbjct: 181 IISTHLISDIEPILDEVIFLKQGRLFLSGNADDLRQEYQQSIDSLFRETYK 231 

40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1337 

A DNA sequence (GBSxl421) was identified in S.agalactiae <SEQ ID 4105> which encodes the amino 
acid sequence <SEQ ID 4106>. Analysis of this protein sequence reveals the following: 

45 Possible site: 48 

»> Seems to have an uncleavable N-term signal seq 



50 



55 Final Results 

bacterial membrane Certainty=0. 7156 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty^O. 0000 (Not Clear) < suco 
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60 The protein has no significant homology with any sequences in the GENPEPT database. 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 4107> which encodes the amino acid 
sequence <SEQ ID 4108>. Analysis of this protein sequence reveals the following: 

Possible site: 28 
>>:> Seems to have a cleavable N-term signal seq. 
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Final Results 

bacterial membrane Certainty=0 . 5607 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below. 

Identities = 116/267 (43%) , Positives = 165/267 (61%) , Gaps = 13/267 (4%) 



Query: 


1 


MFGKLLKYELKSVGKWYLTLNAAVLLVSIILGLVLKALG GNFSTDTNSTSAQIFT 


55 






MFGKLLKYE +S+GKWY LNA V+ ++ IL +K G F TN ++ 




Sb j ct : 


1 


MFGKLLKYEFRSIGKWYFALNAFVIAIAAILSFTIKLFAQSNSDGLFGVLTN KMLP 


56 


Query: 


56 


1 1 LVLLLAMVI SGSLLSTLAI 1 1 KRFYSNI FGRQGYLTLTLP VTTNQI I CSKLLASLLWS 


115 






+ L L +I+GSLLSTL IIIKRF ++FG +GYLTLTLPV ++QII SKLLAS + S 




Sb j ct : 


57 


LTLGLTFGSL IAGSLLSTLLIII KRFSKSVFGWEGYLTLTLPVNSHQI I LSKLLAS FI CS 


116 


Query: 


116 


IFNIFIVIIGIILVILPLVGIGQFWAFPEIYKIISSSNAPIiFIAYFFLSYVAGTLLiyL 


175 






+FN 1+ I +VI+P+ I + + F +K+ N +AY LS LLIYL 




Sbjct,: 


117 


VFNTIILAFAIAIVIVPMFNINELLEGFFNSFKMDYFI1MLTVLAYVLLSTFTSILLIYL 


176 


Query: 


176 


SIAVGQLFTNKRVLMGIVSYFGISLLITFLTLIIDSIFHIDLENSHANA-TFSQPVLLY- 


233 






SI++GQLF+N+R LM ++YF + +LI+ + S HI N+ A++ F++ +Y 




Sbjct: 


177 


SISIGQLFSNRRGLMAFIAYFILVILISVAATYVHS--HIFNINTSADSFPFTEQKTIYL 


234 


Query: 


234 


NI LVS I VE IAI FYMLTHS 1 1 KYKLNIQ 260 








IL +E+ +FY+ T+ I IK KLN+Q 




Sbjct: 


235 


LILEQFIEMIMFYLATNFIIKNKLNLQ 261 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1338 

A DNA sequence (GBSxl422) was identified in S.agalactiae <SEQ ID 4109> which encodes the amino 
acid sequence <SEQ ID 4110>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

>>> Seems to have no N- terminal signal sequence 



Certainty=0. 5890 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 

The protein is similar to ORF24 from S.faecalis. 

No corresponding DNA sequence was identified in S.pyogenes. 



Final Results 

bacterial cytoplasm 

bacterial membrane 

bacterial outside 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1339 

A DNA sequence (GBSxl423) was identified in S.agalactiae <SEQ ID 4111> which encodes the amino 
5 acid sequence <SEQ ID 41 12>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 . 3316 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein is similar to ORF23 from S.faecalis. No corresponding DNA sequence was identified in 
15 S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1340 

A DNA sequence (GBSxl424) was identified in S.agalactiae <SEQ ID 41 13> which encodes the amino 
20 acid sequence <SEQ ID 41 14>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

>>> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0 . 4256 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein is similar to ORF22 from S.faecalis. No corresponding DNA sequence was identified in 
30 S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1341 

A DNA sequence (GBSxl425) was identified in S.agalactiae <SEQ ID 4115> which encodes the amino 
35 acid sequence <SEQ ID 41 1 6>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-13.37 Transmembrane 62 - 78 ( 55 - 84) 
INTEGRAL Likelihood = -8.44 Transmembrane 19 - 35 ( 14 - 41) 

40 



45 



Final Results 

bacterial membrane Certainty=0 . 6349 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein is similar to ORF21 from S.faecalis. 



WO 02/34771 



-1478- 



PCT/GB01/04789 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4117> which encodes the amino acid 
sequence <SEQ ID 4118>. Analysis of this protein sequence reveals the following: 



10 



Possible site: 37 

»> Seems to have no N-terrainal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2444 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 54/236 (22%) , Positives = 95/236 (39%) ( Gaps = 12/236 (5%) 

Query: 204 KDGKLRLMKNVWWEYDKLPHMLIAGGTGGGKTYFILTLIEALLHTDSKLYILDPKN 259 

15 + GK+ ++K+ DK H IAG +G GK Y LT ++L S L I+DPK 

Sbjct: 14 QQGKIPVIKHFELNLDKGSHWAIAGNSGSGKPy-ALTYFLSVLKPKSGLIIIDPKFDTPS 72 

Query: 260 - -ADIADLGSvMANVYyRKEDLLSCIETFYEEMMKRSEEMKQMKNYKTGKNYAYLGLPAH 317 
A + + +KD+S+ + ++ + + ++L + 
20 Sbjct: 73 QWARENKIAVIHPVENHSKSDFVSQVNEQLNQCATLIQKRQAILYDNPNHQFTHLTI 129 

Query: 318 FLIFDEYVAFMEMLGTKENTAVMNKLKQIVMLGRQAGFFLIIACQRPDAKYLGDGIRDQF 377 

+ DE +A E + A + L QI +LG L L QR D + +R+Q 

Sbjct: 130 --VIDEVLALSEGVNKNIKEAFFSLLSQIALLGHATKIHLFLGSQRFDHNTIPISVREQL 187 

25 

Query: 378 NFRVALGRMSEMGYGMMFGSDVQKDFFLKRIKGRGYVDVGTSVISEFYTPLVPKGY 433 

N + +G +++ +F + + GG+V+S PL+ Y 

Sbjct: 188 NVLLQIGNINQKTTQFLFPDLDPEGIVIPTGHGTGIIQWDNEHSYQVLPLLCPTY 243 

30 SEQ ID 4116 (GBS109d) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
cell extract is shown in Figure 121 (lane 8 & 9; MW 71kDa) and in Figure 184 (lane 2; MW 71kDa). It was 
also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 
121 (lane 11; MW 46kDa), Figure 128 (lane 4; MW 46kDa) and Figure 179 (lane 7; MW 46kDa). 
GBS109d-His was purified as shown in Figure 232 (lanes 7 & 8). GBS109d-GST was purified as shown in 

35 Figure 236, lane 10. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1342 

A DNA sequence (GBSxl426) was identified in S.agalactiae <SEQ ID 41 19> which encodes the amino 
40 acid sequence <SEQ ID 4120>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>» Seems to have a cleavable N-term signal seq. 

Final Results 

45 bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

50 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1343 

A DNA sequence (GBSxl427) was identified in S.agalactiae <SEQ ID 4121> which encodes the amino 
acid sequence <SEQ ID 4122>. Analysis of this protein sequence reveals the following: 

Possible site: 32 
5 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4469 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9793> which encodes amino acid sequence <SEQ ID 9794> 
was also identified. 

The protein is similar to ORF20 from S.faecalis. No corresponding DNA sequence was identified in 
15 S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1344 

A DNA sequence (GBSxl428) was identified in S.agalactiae <SEQ ID 4123> which encodes the amino 
20 acid sequence <SEQ ID 4124>. Analysis of this protein sequence reveals the following: 
Possible site: 14 

>>> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0. 1367 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

30 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1345 

A DNA sequence (GBSxl429) was identified in S.agalactiae <SEQ ID 4125> which encodes the amino 
35 acid sequence <SEQ ID 4126>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.77 Transmembrane 39 - 55 ( 34 - 64) 
INTEGRAL Likelihood = -6.32 Transmembrane 16 - 32 ( 10 - 35) 

40 



45 



Final Results 

bacterial membrane Certainty=0. 5310 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein is similar to ORF19 from S.faecalis. No corresponding DNA sequence was identified in 
S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1346 

A DNA sequence (GBSxl430) was identified in S.agalactiae <SEQ ID 4127> which encodes the amino 
5 acid sequence <SEQ ID 4128>. This protein is predicted to be antirestriction protein. Analysis of this 
protein sequence reveals the following: 

Possible site: 22 

»> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 . 2918 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein is similar to ORF18 from S.faecalis. No corresponding DNA sequence was identified in 
S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1347 

20 A DNA sequence (GBSxl431) was identified in S.agalactiae <SEQ ID 4129> which encodes the amino 
acid sequence <SEQ ID 4130>. Analysis of this protein sequence reveals the following: 



25 



30 



Possible site: 27 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -3.61 Transmembrane 75 - 91 ( 72 - 94) 

__ Final Results 

bacterial membrane Certainty=0 . 2444 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein is similar to ORF17 from S.faecalis. No corresponding DNA sequence was identified in 
S.pyogenes. 

A related GBS gene <SEQ ID 8793> and protein <SEQ ID 8794> were also identified. Analysis of this 
protein sequence reveals the following: 

35 Lipop Possible site: -1 Crend: 4 

McG: Di scrim Score: -7.12 
GvH: Signal Score (-7.5): -2.52 

Possible site: 43 
>» Seems to have no N-terminal signal sequence 
40 ALOM program count: 1 value: -3.61 threshold: 0.0 

INTEGRAL Likelihood = -3.61 Transmembrane 37 - 53 ( 34 - 56) 
PERIPHERAL Likelihood = 3.66 58 
modified ALOM score: 1.22 

45 *** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 2444 (Affirmative) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 
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100.0/100.0% over 167aa 

Enterococcus faecalis 

EGAD | 14977 | hypothetical protein Insert characterized 
GP|532550|gb|AAB60016.l| |U09422 ORF17 Insert characterized 

5 

ORF00720(187 - 690 of 990) 

EGAD | 14977 | 15011(1 - 168 of 168) hypothetical protein {Enterococcus faecalis} 
GP|532550|gb|AAB60016.l| |U09422 ORF17 {Enterococcus faecalis} 
%Match =50.3 
10 %Identity = 100.0 %Similarity = 100.0 

, Matches = 168 Mismatches = 0 Conservative Sub.s = 0 

120 150 180 210 240 270 300 330 

L*AKYQLVFKTILIIKPMVGI*TFQERLSQPIMGFLKSSIKSVGTLLLADFLFYGVAQSATPIFYERIDYMKKIRSYTSI 

15 mmiiiiiiiiiiiiiiiiiiiiiiimi ii mini nun 

MGFLKSSIKSVGTLLLADFLFYGVAQSATPIFYERIDYMKKIRSYTSI 
10 20 30 40 

360 390 420 450 480 510 540 570 

20 WSVEKVLYSIlSroFRLPFPITFTQMTWFWSLFAVMILGNLPPLSMIEGAFLKYFGIPVAFTWFMSTKTFDGKKPYGFLKS 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
WSVEKVLYSINDFRLPFPITFTQMTWFWSLFAVMILGNLPPLSMIEGAFLKYFGIPVAFTWFMSTKTFDGKKPYGFLKS 

60 70 80 90 100 110 120 

25 600 630 660 690 720 750 780 810 

VIAYALRPKLTYAGKKVTLGRMQPQEAITAVRSEFYGISN*IH*KQSRLE*RRGMLCLL*ACSLQLLISKSRTENTSA*F 

' ■ I I I I I I I I I III I I 'I III I I I I I I I I I I III Ml I HI I I 
VIAYALRPKLTYAGKKVTLGRNQPQEAITAVRSEFYGISN 

140 150 160 

30 

SEQ ID 8794 (GBS223) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 44 (lane 7; MW 18kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

35 Example 1348 

A DNA sequence (GBSxl432) was identified in S.agalactiae <SEQ ID 4131> which encodes the amino 
acid sequence <SEQ ID 4132>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>>> Seems to have no N-terminal signal sequence 

40 

Final Results 

bacterial cytoplasm Certainty=0 .4292 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

45 

A related GBS nucleic acid sequence <SEQ ID 9791> which encodes amino acid sequence <SEQ ID 9792> 
was also identified. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
50 vaccines or diagnostics. 



Example 1349 

A DNA sequence (GBSxl433) was identified in S.agalactiae <SEQ ID 4133> which encodes the amino 
acid sequence <SEQ ID 4134>. Analysis of this protein sequence reveals the following: 



WO 02/34771 



-1482- 



PCT/GB01/04789 



10 



40 



45 



Possible site: 16 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -6.21 Transmembrane 350 - 366 ( 345 - 368) 
INTEGRAL Likelihood = -0.32 Transmembrane 171 - 187 ( 171 - 188) 



Final Results 

bacterial membrane Certainty=0. 3484 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1350 

15 A DNA sequence (GBSxl434) was identified in S.agalactiae <SEQ ID 4135> which encodes the amino 
acid sequence <SEQ ID 4136>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

»> Seems to have no N-terminal signal sequence 

20 



INTEGRAL 


Likelihood 




•10. 


,30 


Transmembrane 


154 


- 170 


( 


148 


- 177) 


INTEGRAL 


Likelihood 




■10. 


.30 


Transmembrane 


21 


- 37 


( 


17 


- 50) 


INTEGRAL 


Likelihood 




-10. 


.03 


Transmembrane 


320 


- 336 


( 


316 


- 367) 


INTEGRAL 


Likelihood 




-7. 


43 


Transmembrane 


346 


- 362 


( 337 


- 367) 


INTEGRAL 


Likelihood 




-7. 


,01 


Transmembrane 


186 


- 202 


( 


180 


- 206) 


INTEGRAL 


Likelihood 




-5. 


.36 


Transmembrane 


411 


- 427 


( 


404 


- 430) 


INTEGRAL 


Likelihood 




-1. 


.17 


Transmembrane 


386 


- 402 


( 


386 


- 402) 



25 

Final Results 

bacterial membrane Certainty=0. 5118 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

35 Example 1351 

A DNA sequence (GBSxl436) was identified in S.agalactiae <SEQ ID 4137> which encodes the amino 
acid sequence <SEQ ID 4138>. Analysis of this protein sequence reveals the following: 



Possible site: 14 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 6306 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1352 

A DNA sequence (GBSxl437) was identified in S.agalactiae <SEQ ID 4139> which encodes the amino 
acid sequence <SEQ ID 4140>. Analysis of this protein sequence reveals the following: 

Possible site: 22 
5 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2973 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 1353 

A DNA sequence (GBSxl438) was identified in S.agalactiae <SEQ ID 4141> which encodes the amino 
acid sequence <SEQ ID 4142>. Analysis of this protein sequence reveals the following: 

Possible site: 42 
20 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3382 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
25 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

There is also homology to SEQ ID 4144. 

A related GBS gene <SEQ ID 8795> and protein <SEQ ID 8796> were also identified. Analysis of this 
protein sequence reveals the following: 

30 Lipop: Possible site: -1 Crend: 3 

McG: Discrim Score: 11.12 
GvH: Signal Score (-7.5): 0.27 

Possible site: 24 
»> Seems to have a cleavable N-term signal seq. 
35 ALOM program count: 0 value: 4.19 threshold: 0.0 

PERIPHERAL Likelihood =4.19 69 
modified ALOM score: -1.34 



40 



45 



*** Reasoning Step: 3 



Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

100.0/100.0% over 332aa 



Enterococcus faecalis 



EGAD | 36209 | hypothetical protein Insert characterized 
50 GP| 532547 | gb | AAB60019 . 1 1 |U09422 0RF14 Insert characterized 



ORF00727(301 - 1299 of 1599) 

EGAD | 36209 | 37602 (1 - 333 of 333) hypothetical protein {Enterococcus 
faecalis}GP| 532547 |gb|AAB60019.l| |U09422 ORF14 {Enterococcus faecalis} 
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%Match =61.7 

%Identity = 100.0 %Similarity = 100.0 

Matches = 333 Mismatches = 0 Conservative Sub.s = 0 

5 249 279 309 339 369 399 429 459 

CSKSTTTKyKK*TTNQNRHH*ESR*ETMKLKTLVIGGSGLFLMVTSLLLFVAILFSDEQDSGISNIHYGGVWSAEVLAH 

miiiiiiiiiiiiimmiimiiiiiijiiiiimimmiiii 

MKLKTLVIGGSGLFLMVFSLLLFVAILFSDEQDSGISNIHYGGVNVSAEVLAH 
10 20 30 40 50 

10 

489 519 549 579 609 639 669 699 

KPMVEK^AKEYGVEEYVNILLAIIQVESGGTAEDVMQSSESLGLPPNSLSTEESIKQGVKYFSELLASSERLSVDLESVI 

illlllllllllllllllilllllllllllllllllllllllllllllllilllllllllllllllllllllllllllll 
KPMWKYAKEYGVEEYVNILLAIIQVESGGTAEDVMQSSESLGLPPNSLSTEESIKQGVKYFSELLASSERLSVDLESVI 

15 70 80 90 100 110 120 130 

729 759 789 819 849 879 909 939 

QSYNYGGGFLGYVAmGNOTFELAQSFSKEYSGGEK^SYPNPIAIPINGGWRYNYG>MFYVQLVTQYLVTTEFDDDTVQ 

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiniiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiniiii 

20 QSYNYGGGFLGWANRGNKYTFEIAQSFSKEYSGGEKVSYPMPIAIPINGGWRYNYGimFYVQLVTQYLVTTEFDDDTVQ 

150 160 170 180 190 200 210 

969 999 1029 1059 1089 1119 1149 1179 

AIMDEALKYEGWRYVYGGASPTTSFDCSGLTQWTYGKAGINLPRTAQQQYDVTQHIPLSEAQAGDLVFFHSTYNAGSYIT 

25 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

AIMDEALKYEGWRYWGGASPTTSFDCSGLTQWTYGKAGINLPRTAQQQYDVTQHIPLSEAOAGDBVFFHSTYNAGSYIT 
230 240 250 260 270 280 290 

1209 1239 1269 1299 1329 1359 1389 1419 

30 HVGIYLGNNRMFHAGDPIGYADLTSPYWQQHLVGAGRIKQ*ERKI***NLEKIRIKKOTIYQRKRNLVSIRSILIKRL*LP 

miiiimiiiiiiiiiiimiiiiiiimiiiiii 

HVGIYLGNNRMFHAGDPIGYADLTSPYWQQHLVGAGRIKQ 
310 320 330 

35 SEQ ID 8796 (GBS155) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 24 (lane 10; MW 38kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 31 (lane 7; MW 62kDa). 

The GBS155-GST fusion product was purified (Figure 111; see also Figure 198, lane 74) and used to 
immunise mice (lane 1 product; 20ug/mouse). The resulting antiserum was used for Western blot, FACS, 
40 and in the in vivo passive protection assay (Table III). These tests confirm that the protein is 
immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1354 

45 A DNA sequence (GBSxl439) was identified in S.agalactiae <SEQ ID 4145> which encodes the amino 
acid sequence <SEQ ID 4146>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.60 Transmembrane 37 - 53 ( 35 - 55) 

50 

Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



Certainty=0. 4439 (Affirmative) < suco 

Certainty=0. 0000 (Not Clear) < suco 

Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 9789> which encodes amino acid sequence <SEQ ID 9790> 
was also identified. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
5 vaccines or diagnostics. 

Example 1355 

A DNA sequence (GBSxl440) was identified in S.agalactiae <SEQ ID 4147> which encodes the amino 
acid sequence <SEQ ID 4148>. Analysis of this protein sequence reveals the following: 

Possible site: 40 
10 >>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -0.00 Transmembrane 391 - 407 ( 391 - 407) 

Final Results 

bacterial membrane Certainty=0 . 1001 (Affirmative) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9787> which encodes amino acid sequence <SEQ ID 9788> 
was also identified. 

20 A related DNA sequence was identified in S.pyogenes <SEQ ID 4149> which encodes the amino acid 
sequence <SEQ ID 4150>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

>>> Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0 . 2027 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 An alignment of the GAS and GBS proteins is shown below. 

Identities = 183/669 (27%), Positives = 305/669 (45%), Gaps = 63/669 (9%) 

Query: 7 KIINIGVLAHVDAGKTTLTESLLYNSGAITELGSVLKGTTRTDNTLLERQRGITIQTGIT 66 
K NIG++AHVDAGKTT TE +LY +G I ++G +G ++ D E++RGITI + T 
35 Sbjct: 9 KTRNIGIMAHVDAGKTTTTERILYYTGKIHKIGETHEGASQMDWMEQEQERGITITSAAT 68 

Query: 67 SFQWENTKVNIIDTPGHMDFLAEVYRSLSVLDGAILLISAKDGVQAQTRILFHALRKMGI 126 

+ QW+ +VNI IDTPGH+DF EV RSL VLDGA+ ++ ++ GV+ QT ++ + G+ 
Sbjct: 69 TAQWDGHRWIIDTPGHVLFTIEVQRSLRVLDGAvTvIiDSQSGvBPQTETVWRQATEYGV 128 

40 

Query: 127 PTIFFINKIDQNGIDLSTVYQDIKEKLSAEI VI KQKVELYPN 168 

P I F NK+D+ G D Q + ++L A +IK K E+Y N 

Sbjct: 129 PRIVFANKMDKIGADFLYSVQTLHDRLQANAHPIQLPIGAEDDFRGIIDLIKMKAEIYTN 188 

45 Query: 169 MCVTNFTES EQW-- DTVIEGNDDLLEKYMSGKSLEALELEQEESIRF 213 

T+ E E++ + V E ++DL+ KY+ G+ + EL 

Sbjct: 189 DLGTDILEEDIPEEYLEOAQEYREKLIEAVAETDEDLMMKYLEGEEITNDELIAGIRKAT 248 

Query: 214 HNCSLFPVYHGSAKNNIGIDNLIEVI TNKFYSSTHRGPSE L 254 

50 N FPV GSA N G+ +++ + N + P+ 

Sbjct: 249 INVEFFPVLCGSAFKNKGVQLMLDAVIAYLPSPLDIPAIKGVNPDTDAEEERPASDEEPF 308 

Query: 255 CGNVFKIEYTKKRQRLAYIRLYSGVLHLRDSVRVSEKEKI KVTEMYTS INGELCKI 310 

FKI RL + R+YSGVL+ V + K K ++ +M+ + E I 
55 Sbjct: 309 AALAFKIMTDPFVGRLTFFRWSGVLNSGSYVMNTSKGKRERIGRILQMHANSRQE 1 365 
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Query: 


311 


DRAYSGEIVILQN-EFLKLNSVLGDTKLLPQRKKIENPHPLLQTTVEPSKPEQREMLLDA 


369 






+ Y+G+I + L D K + IE P P++Q VEP ++ + A 




Sb j ct : 


356 


ETVYAGDIAAAVGLKDTTTGDSLTDEKAKVILESIEVPEPVIQLMVEPKSKADQDKMGVA 


425 


Query: 


370 


LLEISDSDPLLRYYVDSTTHEIILSFLGKVQMEVISALLQEKYHVEIELKEPTVIYME-- 


427 






L ++++ DP R + T E +++ +G++ ++V+ ++ ++ VE + P V Y E 




Sb j ct : 


426 


LQKIAEEDPTFRVETNVETGETV1AGMGELHLDVLVDRMKREFKVEANVGAPQVSYRETF 


485 


Query: 


428 


RPLKNAEYTIHIEVPPNPFWASIGLSVSPLPLGSGMQYESSVSLGYLiNQSFQNAVMEGIR 


487 






R A + ++++PGG ++E+++ G + + F AV +G+ 




Sbjct: 


486 


RASTQARGFFKRQSGGKGQFGDVWIEFTPNEEGKGFEFENAIVGGWPREFIPAVEKGLI 


545 


Query: 


488 


YGCEQG-LYGWNVTDCKICFKYGLYYSPVSTPADFRMLAPIVLEQVLKKAGTELLEPYLS 


546 






G L G+ + D K G Y+ S+ F++ A + L++ K A +LEP + 




Sb j ct : 


546 


ESMANGVLAGYPMVDVKAKLYDGSYHDVDSSETAFKIAASLALKEAAKSAQPAILEPMML 


605 


Query: 


547 


FKIYAPQEYLSRAYNDAPKYCANIVDTQLKNNEVILSGEIPARCIQEYRSDLTFFTNGRS 


606 






T 7A D±± T. x _i_ "NT To. a.T3 j. V m T. T 




Sb j ct : 


606 


TOITAPEDNLGDVMGHVTARRGRVDGMEAHGNSQIVRAYVPLAEMFGYATVLRSATQGRG 


665 


Query: 


607 


VCLTELKGY 615 
+ Y 




Sbjct: 


666 


TFMMVFDHY 674 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1356 

A DNA sequence (GBSxl441) was identified in S.agalactiae <SEQ ID 4151> which encodes the amino 

acid sequence <SEQ ID 4152>. Analysis of this protein sequence reveals the following: 

, Possible site: 33 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2530 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1357 

A DNA sequence (GBSxl442) was identified in S.agalactiae <SEQ ID 4153> which encodes the amino 
acid sequence <SEQ ID 4154>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty^O. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1358 

A DNA sequence (GBSxl443) was identified in S.agalactiae <SEQ ID 4155> which encodes the amino 
5 acid sequence <SEQ ID 4156>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

>>> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 1630 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1359 

A DNA sequence (GBSxl444) was identified in S.agalactiae <SEQ ID 4157> which encodes the amino 
20 acid sequence <SEQ ID 4158>. This protein is predicted to be excisionase-related protein. Analysis of this 
protein sequence reveals the following: 

Possible site: 40 

»> Seems . to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0 .4481 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 The protein is similar to transposon Tn916 from S.faecalis. No corresponding DNA sequence was identified 
in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1360 

35 A DNA sequence (GBSxl445) was identified in S.agalactiae <SEQ ID 4159> which encodes the amino 
acid sequence <SEQ ID 4160>. This protein is predicted to be transposase. Analysis of this protein 
sequence reveals the following: 

Possible site: 46 

>>> Seems to have no N-terminal signal sequence 

40 

Final Results 

bacterial cytoplasm Certainty=0. 4626 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 

The protein is similar the Tnl545 integrase from S.pneumoniae and to SEQ ID 578. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1361 

A DNA sequence (GBSxl446) was identified in S.agalactiae <SEQ ID 4161> which encodes the amino 
5 acid sequence <SEQ ID 4162>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-10.72 Transmembrane 18 - 34 ( 13 - 41) 

INTEGRAL Likelihood = -6.10 Transmembrane 58 - 74 ( 55 - 79) 

10 INTEGRAL Likelihood = -5.04 Transmembrane 97 - 113 ( 90 - 116) 

INTEGRAL Likelihood = -1.81 Transmembrane 78 - 94 ( 78 - 94) 

INTEGRAL Likelihood = -0.85 Transmembrane 145 - 161 ( 145 - 161) 

Final Results 

15 bacterial membrane Certainty=0 . 5288 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

20 >GP:AAC74820 GB:AE000270 orf, hypothetical protein [Escherichia coli K12] 

Identities = 43/174 (24%) , Positives = 84/174 (47%) , Gaps = 9/174 (5%) 

Query: 24 LIATLVLWYLYKL GILNDSNELKDLVHKYEFWGPMI FIVAQI VQIVFPVI PGG 77 

L A L+ + +Y + +L D L+ L+ + F+G ++I+ 1+ + ++PG 

25 Sbjct: 24 LFACLIFALVIYAIHAFGLFDLLTDLPHLQTLIRQSGFFGYSLYILLFIIATLL-LLPGS 82 

Query: 78 OTTVAGFLIFGPTLGFIYNYIGIIIGSVILFWLWFYGRKFVLLFM-DQKTFDKYESKLE 136 

+ +AG ++FGP LG + + I + S F L ++ GR +L ++ TF E + 
Sbjct: 83 ILVIAGGIVTOPLLGTLLSLIAATLASSCSFLLARWLGRDLLLKYVGHSNTFQAIEKGIA 142 



30 



55 



Query: 137 TSGYEKFFIFCMASPISPADIMVMITGLSNMSIKRFVTIIMITKPISIIGYSYL 190 

+G + F I P+ P +1 GL+ ++ + I +T 1+ Y+ + 

Sbjct: 143 RNGID-FLILTRLIPLFPYNIQNYAYGLTTIAFWPYTLISALTTLPGIVIYTVM 195 



35 A related DNA sequence was identified in S.pyogenes <SEQ ID 4163> which encodes the amino acid 
sequence <SEQ ID 4164>. Analysis of this protein sequence reveals the following: 

Possible site: 43 
>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -4.30 Transmembrane 8 - 24 ( 6 - 29) 
40 INTEGRAL Likelihood = -0.80 Transmembrane 57 - 73 ( 57 - 73) 

INTEGRAL Likelihood = -0.00 Transmembrane 86 - 102 ( 86 - 102) 

Final Results 

bacterial membrane Certainty=0. 2720 (Affirmative) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below. 

50 Identities = 85/114 (74%) , Positives = 101/114 (88%) 

Query: 89 PTLGFIYNYIGIIIGSVILFWLVKFYGRKFVLLFMDQKTFDKYESKLETSGYEKFFIFCM 148 

P GFIYNY+GI I IGS+ LF LVK YGRKF+LLF++ KTF KYE +LET GYEK FIFCM 
Sbjct: 3 POTGFIYNYVGIIIGSIALFLLVKTYGRKFILLFVNDKTFYKYERRLETPGYEKLFIFCM 62 



Query: 149 ASPISPADIMVMITGLSNMSIKRFVTIIMITKPISIIGYSYLWIYGGDILKNFL 202 

ASP+SPADIMVMITGL++MS+KRFVTI++ITKPISIIGYSYL+I+G D++ FL 
Sbjct: 63 ASPVSPADIMVMITGLTDMSLKRFVTILLITKPISIIGYSYLFIFGKDVISWFL 116 
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There is also homology to SEQ ID 1728. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

5 Example 1362 

A DNA sequence (GBSxl447) was identified in S.agalactiae <SEQ ID 4165> which encodes the amino 
acid sequence <SEQ ID 41 66>. This protein is predicted to be chlorAMPhenicol acetyltransferase (cat). 
Analysis of this protein sequence reveals the following: 

Possible site: 28 
10 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 4725 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA86871 GB:U19459 VAT B [Staphylococcus aureus] 
Identities = 57/130 (43%) , Positives = 81/130 (61%) , Gaps = 4/130 (3%) 

20 

Query: 57 IGAFCSIAQNVT--ITGLNHPTDHITTNPFIYYKSRGFINEDRADLIDEKKNGKVIIGND 114 

IG FC+IA+ + + G NH 4 ITT PF G+ + L D G ++GND 

Sbjct: 65 IGKFCAIAEGIEFIMNGANHRMNSITTYPF-NIMGNGW-EKATPSLEDLPFKGDTWGND 122 

25 Query: 115 TOIGTNOTILPSWIGNGAIIGAGSVITKDIPDYAWaGTPAKIIKXRFSEEEITLIiNAS 174 

VWIG NVT++P + IG+GAI+ A SV+TKD+P Y ++ G P++IIK RF +E I L 
Sbjct: 123 WIGQNVTVMPGIQIGDGAIVAANSVVTKDVPPYRIIGGNPSRIIKKRFEDELIDYLLQI 182 

Query: 175 QWWNWSDEAI 184 
30 +WW+WS + I 

Sbjct: 183 KWWDWSAQKI 192 

There is also homology to SEQ ID 1944. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

Example 1363 

A DNA sequence (GBSxl448) was identified in S.agalactiae <SEQ ID 4167> which encodes the amino 
acid sequence <SEQ ID 4168>. Analysis of this protein sequence reveals the following: 

Possible site: 39 
40 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2398 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
50 vaccines or diagnostics. 
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Example 1364 

A DNA sequence (GBSxl449) was identified in S.agalactiae <SEQ ID 4169> which encodes the amino 
acid sequence <SEQ ID 4170>. This protein is predicted to be cation-transporting P-ATPase PacL. 
Analysis of this protein sequence reveals the following: 

Possible site: 34 

>>> Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood 




-9. 


,18 


Transmembrane 


873 


- 889 


( 


866 


- 894) 


INTEGRAL 


Likelihood 




-8. 


.39 


Transmembrane 


257 


- 273 


( 


251 


- 276) 


INTEGRAL 


Likelihood 




-5. 


.95 


Transmembrane 


67 


- 83 


( 


65 


- 88) 


INTEGRAL 


Likelihood 




-5, 


.41 


Transmembrane 


282 


- 298 


( 


281 


- 301) 


INTEGRAL 


Likelihood 




-1. 


.65 


Transmembrane 


90 


- 106 


( 


89 


- 107) 


INTEGRAL 


Likelihood 




-0. 


.48 


Transmembrane 


737 


- 753 


( 


736 


- 753) 


INTEGRAL 


Likelihood 




-0. 
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- 914 
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Final Results 

bacterial membrane Certainty=0. 4673 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10963> which encodes amino acid sequence <SEQ ID 
10964> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB85991 GB:AE000912 cation-transporting P-ATPase PacL 
[Methanothermobacter thermoautotrophicus] 
Identities = 409/922 (44%) , Positives = 609/922 (65%), Gaps = 22/922 (2%) 



Query: 


10 


TNTRFAKEELEEVFEELGTTQGGLSDEEVAVRQKKYGLNLLSEVKQESIILLFLKNFTSL 


69 






T T + E+EEV + L T++ GL +E R K +G N L EVK+ +ILLFL N ++ 




Sb j ct : 


4 


TMTAIYELEVEEVLQRLETSESGLDPQEMKRLKIHGPl^EEvKRRPLIIjLFLSNLYNV 


63 


Query: 


70 


MAILLWVGGFVAIVSNSLELGLAIWMVNVINGIFSFIQEyRASQATQALEKMLPSYSRVL 


129 






+A+LLW+ ++ ++ + +L +AI MV +IN +FSF QEY A +A +AL+ +LP +V+ 




Sb j ct : 


64 


LALLLWIAAILS F I TGNYQLAVAI VMVI I INALFSFWQEYEAEKAAEALKNILPVMVKVI 


123 


Query: 


130 


RKGSEEKILSEQLVPGDIVLIEEGDRISADGRLIKTTDLQVNQSALTGESNPIYKDSNVE 


189 






R E I + +V GDI+++EEGD + AD R++++ +L+V+ SALTGES P+ K S+ 




Sbjct: 


124 


RASKEVLIPAADVVHGDIIILEEGDTVPADARILESHNLRVDASALTGESKPVRKVSHPV 


183 


Query: 


190 


NDQSKTLIECDNMVFAGTTVSSGSATMWTAIGMQTQFGQIADLTQGMKSEKSPLQRELD 


249 






+ + 1+ +N++FAGT V+SG+ V A G T+F +IA LTQ ++ E SPLQR++ 




Sb j ct : 


184 


RE-ADNYIDTENILFAGTQVTSGTGRAAVFATGRDTEFSRIATLTQEVREEPSPLQRQIS 


242 



Query: 250 RLTKQISIISITVGIIFFIAATFFVKEPVSKSFIFALGMIVAFIPEGLLPTVTLSLAMAV 3 09 

+ I +++ +G+I FL + V+ P+ +FIFA+G++VA +PEGLLP+VTLSLA + 
Sbjct: 243 IAARIIGALAVAMGVILFLVNLYIVRLPLETAFIFAIGLMVANVPEGLLPSVTLSLAASA 302 

Query: 310 QRMAKEHALWKLSSVETLGATSVICSDKTGTLTQNEMTVNHLWQNGKSYQVTGLGYAPE 369 

++MA+E+ALVK+LSSVETLG+T++IC+DKTGTLT+ EMTV +W K +VTG GY PE 
Sbjct: 303 RKMARENALVT<RLSSvETLGSTTIICTDKTGTLTRGEMTVRKIWIPHKVIEvTGSGYRPE 362 

Query: 370 GQILFEGDNICFGNSDRGDLEKLIRFAHLCSNAQVLPPNDDRSTYTVLGDPTEACLNVLL 429 

GQ LF G+ + + D +L+ L+R A C+++ ++ + ++VLGD TE L V 
Sbjct: 363 GQFLFRGEPV- - SHRDMAELKLLMRAATFCNDSALI HEEGEWS VLGDSTEGALLVAA 417 

Query: 430 EKSGINIQENRKFAPRLKELPFDSVRKRMTTIHSLGGDEKDKKISITKGAPKEILDLSDY 489 

EK G + + K PR+ ELPFDS RK MT+IH G K+++ KGAPK+I+ LS+ 
Sbjct: 418 EKLGFDAEAELKAMPRITELPFDSRRKSMTSIHEKSG KRVAYVKGAPKKI IGLSER 473 

Query: 490 VLSDGKVIPLNKEERNKIQLANDTFAKTJGLRVLAVSYCDIEGFSKEQWTQENLEQHMVFI 549 

+ DG+V L+ +E+ +1 +D A GLRVLA +Y ++ E +E+ +V + 

Sbjct: 474 ISVDGRVRALHADEKERIIGIHDEMASKGI^VIjAFAYRELPE-DLEVRDPGEVERDLVLV 532 
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G+ AM DPPREGV+EA++ C A IRIIM+TGDYGLTA +IA+ IGI+ + ++I G E 



L ++ D++L+ L+ E ++FAR PE K R+ ++L++ E+VA+TGDGVNDAPAL+K+D 



10 IGVAMG +GTDVAKE+AD++L DD+FASIV AV EGR VY+NI+KF+TYIF+ T E VP 



20 



.25 



50 



Query: 


550 


Sb j ct : 


533 


Query: 


610 


Sbjct: 


592 


Query: 


668 


Sbjct: 


652 


Query: 


728 


Sbjct: 


711 


Query: 


788 


Sbjct: 


769 


Query: 


845 


Sbjct: 


827 


Query: 


905 


Sb j ct : 


887 



F + IPLP+T+MQILA+DLGTD LPAL LG PE+DVM PPR ++RLL++ 

15 Sbjct: 711 --FIMMVLFSIPLPITIMQILAIDLGTDTLPALALGRSLPESDVMKLPPRAPSERLLNRE 768 



++++ +L+ GTIE+ h M +F Y G + A+ Y ATT+ 1+ +Q 



+G +++S+T S + N+ I G++ I L+++Y+P +F TA G+ W 



LI 1+ DE+RK 



There is also homology to SEQ ID 4 1 72. 

30 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1365 

A DNA sequence (GBSxl450) was identified in S.agalactiae <SEQ ID 4173> which encodes the amino 
acid sequence <SEQ ID 4174>. Analysis of this protein sequence reveals the following: 

35 Possible site: 25 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3740 (Affirmative) < suco 

40 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB46979 GB:AJ243482 CSRA protein [Enterococcus faecalis] 
45 , Identities = 85/132 (64%) , Positives = 105/132 (79%) 



Query: 2 KETQEELRQRIGHTAYQVTQNSATEHAFTGKYDDFFEEGIYVDIVSGEVLFSSLDKFQSG 61 

K T+EEL+Q + Y VTQ +ATE F+G+YDDF+++GIYVDIVSGE LFSSLDK+ +G 
Sbjct: 3 KPTEEELKQTLTDLQYAVTQENATERPFSGEYDDFYQDGIYVDIVSGEPLFSSLDKYDAG 62 

Query: 62 CGWPAFSKPIFjNRMVTNHQDHSHGMHRIEVRSRQADSHLGHVFNDGPVDAGGLRYCINSA 121 

CGWP+F+KPIE R V D SHGMHR+EVRS++ADSHLGHVF DGP+ GGLRYCIN+A 
Sbjct: 63 CGWPSFTKPIEKRGVKEKADFSHGMHRVEVRSQEADSHLGHVFTDGPLQEGGLRYCINAA 122 

55 Query: 122 ALDFI PYDQMAK 133 

AL F+P + K 
Sbjct: 123 ALRFVPVADLEK 134 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4175> which encodes the amino acid 
60 sequence <SEQ ID 4176>. Analysis of this protein sequence reveals the following: 
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Possible site: 24 

»> Seems to have no N-terminal signal sequence 



Final Results 

5 bacterial cytoplasm Certainty=0 . 3692 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

10 Identities = 109/142 (76%) , Positives = 126/142 (87%) 

Query: 3 ETQEELRQRIGHTAYQVTQNSATEHAFTGKYDDFFEEGIYVDIVSGEVLFSSLDKFQSGC 62 

ET +EL+QRIG +Y+VTQ++ATE FTG+YD+FFE+GIYVDIVSGEVLFSSLDKF SGC 
Sbjct: 2 ETSDELKQRIGDLSYEVTQHAATESPFTGEYDNFFEKGIYVDIVSGEVLFSSLDKFNSGC 61 

15 

Query: 63 GWPAFSKPIENRMVTNHQDHSHGMHRIEVRSRQADSHLGHVFNDGPVDAGGLRYCINSAA 122 

GWPAFSKPIENRMVTNH D S+GM R+EV+SR+A SHLGHVF+DGP +AGGLRYCINSAA 
Sbjct: 62 GWPAFSKPIENRMVTNHDDSSYGMRRVEVKSREAGSHLGHVFSDGPKEAGGLRYCINSAA 121 

20 Query: 123 LDF I PYDQMAKRGYGDYLSLFD 144 

L FIPYDQM K GY +L+LFD 
Sbjct: 122 LKFIPYDQMEKEGYAQWLTLFD 143 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
25 vaccines or diagnostics. 

Example 1366 

A DNA sequence (GBSxl451) was identified in S.agalactiae <SEQ ID 4177> which encodes the amino 

acid sequence <SEQ ID 4178>. Analysis of this protein sequence reveals the following: 

Possible site: 25 
30 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1674 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



>GP:BAB05127 GB:AP001511 unknown [Bacillus halodurans] 
Identities = 48/152 (31%) , Positives = 77/152 (50%) , Gaps = 1/152 (0%) 

Query: 1 MIRRAKEKDLPDIAELLKQILMLHHEVRPDIFHTRGSKFSKEQLKEMLIDESKPIFVYES 60 

+ IR A +D ++A L Q+ H + R DIF + + + + E + V+ 

Sbjct: 2 IIREAWQDYEEVARLHTQVHEAHVKERGDIFRSNEPTLNPSFFQAAVQGEKSTVLVFVD 61 

45 Query: 61 DEGKWAHLFLQLQEKRDLPR-KSFKTLYIDDLCIDEEVRGQQIGQKLMDFARQYAKKHG 119 

+ K+ A+ + L + LP + KT+YI DLC+DE RG IG+ 1 + + Y K H 
Sbjct: 62 EREKIGAYSVIHLVQTPLLPTMQQRKTVYISDLCVDETRRGGGIGRLIFEAIISYGKAHQ 121 

Query: 120 CYNITLNVWNDNQRAVSFYEKLGFKPQQTQME 151 
50 I L+V++ N RA +FY LG + Q+ ME 

Sbjct: 122 VDAIELDVYDFNDRAKAFYHSLGMRCQKQTME 153 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
55 vaccines or diagnostics. 
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Example 1367 

A DNA sequence (GBSxl452) was identified in S.agalactiae <SEQ ID 4179> which encodes the amino 
acid sequence <SEQ ID 4180>. Analysis of this protein sequence reveals the following: 

Possible site: 52 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3 2 85 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9785> which encodes amino acid sequence <SEQ ID 9786> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:BAB06554 GB:AP001516 unknown conserved protein [Bacillus halodurans] 

Identities = 108/211 (51%) , Positives = 149/211 (70%) 

Query: 7 EDVII^ATENMVBHKLKNDPSGHDWFHITOVRNIAvELAHKEGANTFI CQMAALLHDI ID 66 
E IL + E V +L ++ SGHDW+HI RV +A + +E + F+ Q+AAL HD+ID 
20 Sbjct: 3 EQAILQSAEAWVKKQLMDEYSGHDWYHIRRVTLMAKAIGEQEKVDVFVVQIAALFHDLID 62 

Query: 67 DKICQDSKQASYELTQWLYSQDIAIAEVEHILDILENISFKAGTGLTMKTLEGQIVQDAD 126 

DK+ D + A +L W+ + + +++H +DI+ ISFK G G ++ T E +VQDAD 
Sbjct: 63 DKLVDDPETAKQQLIDWMEAAGVPSQKIDHTMDIINTISFKGGHGQSLATREAMWQDAD 122 

25 

Query: 127 RLDAMGAIGIARTMAYSGSKGRLIHDPNLKPRENLTLEEYRNGQDTAIIHFYEKLLKLKD 186 

RLDA+GAIGIART AYSG+KG+ I+DP L RE +T+EEYR+G+ TAI HFYEKL KLKD 
Sbjct: 123 RLDALGAIGIARTFAYSGNKGQPIYDPELPIRETMTVEEYRHGKSTAINHFYEKLFKLKD 182 

30 Query: 187 LMNTKQGKMLAQKRHDFLELYLAEFYAEWNG 217 

LMNT+ GK LA++RH F+E ++ F +EWNG 
Sbjct: 183 LMNTETGKQLAKERHVFMEQFIERFLSEWNG 213 

No corresponding DNA sequence was identified in S.pyogenes. 

35 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1368 

A DNA sequence (GBSxl453) was identified in S.agalactiae <SEQ ID 4181> which encodes the amino 
acid sequence <SEQ ID 4182>. Analysis of this protein sequence reveals the following: 

40 Possible site: 21 

>>> May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

!GB:U25448 internalin [Listeria monocytogenes] 

50 !GB:U25448 internalin [Listeria monocytogenes] 

!GB:U25448 internalin [Listeria monocytogenes] 

!GB:U25448 internalin [Listeria monocytogenes] 

>GP:AAA69530 GB:U25448 internalin [Listeria monocytogenes] 
55 Identities = 78/253 (30%) , Positives = 132/253 (51%) , Gaps = 2/253 (0%) 



WO 02/34771 



PCT/GB01/04789 



-1494- 



Query: 531 LKQLWMTNTGITDYSFLDKMPLLEGLDISQNGIKDLSFLTKYKQLSLIAAftlJNGITSLKP 590 

L Q+ +N +TD + L + L +++NI D++ L L+ + NN IT + P 

Sbjct: 26 LTQINFSTOQLTDITPLKDLTKLVDILMMOTQIMITPIANLSNLTGLTLFNNQITDIDP 85 

Query: 591 LAELPNLQFLVLSHmiSDLTPLSNLTKLQELYIBHl^KNLSALSGKKDLKVLDLSNNK 650 

L L NL L LS N ISD++ LS LT LQ+L L N V +L L+ L+ LD+S+NK 
Sbjct: 86 LKHLTNIiNRLELSSOTISDISMjSGLTSLQQLSLG-NQVTDLKPLANLTTLERLDISSNK 144 

Query: 651 SADLSTL-KTTSLETLLI^NETNTSNLSFLKQNPKVSNLTINNAKLASLDGIEESDEIVKV 709 

+D+S L K T+LE+L+ S+++ L + L++N +L + + + + 

Sbjct: 145 VSDISVIAKLTNLESLIATNNQISDITPLGILTNIi^ 204 

Query: 710 EAEGNQIKSLVLKNKQGSLKFLNVTNNQLTSLEGVNNYTSLETLSVSKNKLESLDIKTPN 769 

+ NQI +L L L + NQ++++ + T+L L +++N+LE + + 

Sbjct: 205 DLANNQISNLAPLPGLTKLTELKLGANQISNIXPLAGLTALTNLELNENQLEDISPISNL 264 

Query: 770 KTVTNLDFSHNNV 782 

K +T L NN+ 
Sbjct: 265 KNLTYLTLYFNNI 277 
Identities = 91/300 (30%) , Positives = 141/300 (46%) , Gaps = 42/300 (14%) 

Query: 519 INDMTPVLQFKKLKQLWMTNTGITDYSFLDKMPLLEGLDISQNGIKD LSFLTKYKQL 575 

I D+TP+ L L + N ITD L + L L++S N I D LS LT +QL 

Sbjct: 58 IADITPLiANLSNLTGLTLFNMQITDIDPLKlttTNLNRLELSSNTISDISALSGLTSLQQL 117 

Query: 576 SLIAAANNGITSLKPLA ELPNLQFLVLSHNNISDLTPL 613 

SL N +T LKPLA +L NL+ L+ ++N ISD+TPL 

Sbjct: 118 SL GNQVTDLKPLANLTTLERLDISSNCTSDISVIAKLTNLESLIATNNQISDITPL 173 

Query: 614 SNLTKLQELYLDHNNVKNLSALSGKKDLK\^ 672 

LT L EL L+ N +K++ L+ +L LDL+NN+ ++L+ L T L L L 
Sbjct: 174 GILTNLDELSIiNGNQLKDIGTLASLTI^TDLDIAITOQISNIAPLPGLTKLTELKLGANQI 233 

Query: 673 SNLSFLKQNPKVSNLTIOTAKLASLIXSIEESDEIVKvEAEGNQIKSLvIjKNKQGSLKFLN 732 

SN+ L ++NL +N+L + I + + NI++ L+L 

Sbjct: 234 SNIXPIAGLTALTNLELNENQLEDISPISNLKNLTYLTLYFNNISDISPVSSLTKLQRLF 293 

Query: 733 VTNNQLTSLEGVNNYTSLETLSVSKNKLESLDIKTPNKTvTNLDFSH^ 792 
NN+++ + + N T++ LS N++ L TP +T + +QL LN++ 

Sbjct: 294 FYNNKVSDVSSLANLTNINWLSAGHNQISDL TPLANLTRI TQLGLNDQ 341 

Identities = 73/253 (28%) , Positives = 124/253 (48%) , Gaps = 4/253 (1%) 

Query: 540 GITDYSFLDKMPLLEGLDISQNGIKDLSFLTKYKQLSLIAAANNGITSLKPLAELPNLQF 599 

GI L+ + L ++ S N + D++ L +L I NN I + PLA L NL 

Sbjct: 13 GIKSIDGLEYI^NNLTQINFSNNQLTDITPLKDLTKLVDILMNNNQIADITPLANLSNLTG 72 

Query: 600 LVIjSHNNISDLTPLSNLTKLQELYLDHNNVKNLSALSGKKDLKVLDLSNNKSADLSTLKT 659 

L L +N I+D+ PL NLT L L L N + ++SALSG L+ L L N + 
Sbjct: 73 LTLFNNQITDIDPLKNLTNLNRLELSSNTISDISALSGLTSLQQLSLGNQVTDLKPLANL 132 

Query: 660 TSLETLLIjNETNTSNLSFLKQNPKVSNLTII^AKIASLDGIEESDEIvTCVEAEGNQIKSL 719 

T+LE L ++ S++S L + + +L N +++ + + + ++ GNQ+K + 

Sbjct: 133 TTLERLDISSNKVSDISVIAKLTNLESLIATNNQISDITPLGILTNLDELSLNGNQLKDI 192 

Query: 720 VIjKNKC^SLKFLNVTNNQLTSLEGvNNYTSLETLSVSKNKLESLDIKTPNKTVTN^ 779 

+L L++ NNQ+++L + T L L + N++ ++ +TNL+ + 

Sbjct: 193 GTIASLTNLTDLDIANNQISNLRPLPGLTKLTELKLGANQISNIXPIAGLTALTNLELNE 252 

Query: 780 NNV PTSQLK 788 

N + P S LK 
Sbjct: 253 NQLEDISPISNLK 265 
Identities = 56/209 (26%) , Positives = 115/209 (54%) , Gaps = 2/209 (0%) 

Query: 575 LSLIAAANNGITSLKPIAELPNLQFLvIjSHNNISDLTPLSNLTKLQELYLDHNNVKNLSA 634 

++ + A GI S+ L L NL + S+N ++D+TPL +LTKL ++ +++N + +++ 
Sbjct: 4 WTLQADRLGIKSIDGLEYIjNNLTQINFSNNQLTDITPLKDLTKLvDILMNNNQIADITP 63 
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Query: 635 LSGKKDIiKTODLSNNKSADLSTLKT-TSLETLLl^™TSNLSFLKQNPKVSNLTINNAK 693 

L+ +L L L NN+ D+ LK T+L L L+ S++S L + L++ N + 

Sbjct: 64 LANLSNLTGLTLFIMQITDIDPLKNLTNIjNRLELSSNTISDISALSGLTSLQQLSLGN-Q 122 

5 Query: 694 LASLDGIEESDEIVKA^AEGNQIKSLVLKI^QGSLKFIJmTOQLTSLEGVimTSLETL 753 

+ L + + +++ N++ + + K +L+ L TNNQ++ + + T+L+ L 

Sbjct: 123 VTDLKPLANLTTLERLDISSNKViSDISVIiRKLTNLESLIATNNQISDITPLGIIiTKIjDEL 182 

Query: 754 SVSKNKLESLDIKTPNKTVTNLDFSHNNV 782 
10 ■ S++ N+L+ + +T+LD ++N + 

Sbjct: 183 SLNGNQLKDIGTLASLTNLTDLDLANNQI 211 
Identities = 61/228 (26%) , Positives = 118/228 (51%) , Gaps = 3/228 (1%) 

Query: 483 mTVTKINIGQRTNPFQRFGLSLMPNIEVLGIGFTPIM)MTPVLQFKKLKQLWMTNTGIT 542 
15 L ++ ++++G + + L+ + +E L I ++D++ + + L+ L TN 1+ 

Sbjct: 111 LTSLQQLSLGNQVTDLKP- -LANLTTLERLDISSNKVSDISVLAKLTNLESLIATNNQIS 168 

Query: 543 DYSFLDKMPLLEGLDISQNGIKDLSFLTKYKQLSLIAAANNGITSLKPLAELPNLQFLVL 602 
D + L + L+ L ++ N +KD+ L L+ + ANN I++L PL L L L L 

20 Sbjct: 169 DITPLGILTNLDELSLNGNQLKDIGTIASLTNLTDLDIAl^QISNLAPLPGLTKLTELKL 228 

Query: 603 Sffl^ISDLTPLSNLTKLQELYLDHNNvia^SALSGKmLKVLDLSNNKSADLSTLKT-TS 661 <j 

N IS++ PL+ LT L L L+ N ++++S +S K+L L L N +D+S + + T 
Sbjct: 229 GANQISNIXPLAGLTALTNLELNENQLEDISPISNLKNLTYLTLYFNNISDISPVSSLTK 288 

25 . 

Query: 662 LETLLLNETNTSNLSFLKQNPKVSISILTINNAKLASLDGIEESDEIVKV 709 

L+ L S++S L ++ L+ + +++ L + I ++ 

Sbjct: 289 LQRLFFYNNKVSDVSSLANLTNIITOILSAGHNQISDLTPLANLTRITQL 336 
Identities = 60/286 (20%) , Positives = 129/286 (44%) , Gaps = 24/286 (8%) 

' Query: 369 SNKLSDEDQKKLIYLAEKLGLNPNQIEVLTSEDGSIIFKYPHDDHSHTIASKDIEIGKPI 428 
+N+++D D K + +L L+ N I +++ G + + + +G + 

Sbjct: 77 NNQITDIDPLKNLTNLNRLELSSNTISDISALSG LTSLQQLSLGNQV 123 

35 Query: 429 PDGHHDHSHAKDKVGMATLKQIGFDDEIIQDILHADAPTPFPSNETNPEKMRQW--LATV 486 
D K + TL+++ + DI T S ++ L + 

Sbjct: 124 TD LKPLANLTTLERLDISSNKVSDISVLAKLTNLESLIATNNQISDITPLGIL 176 

Query: 487 TKIN-IGQRTNPFQRFG-LSLMPNIEVLGIGFTPINDMTPVLQFKKLKQLWMTNTGITDY 544 
40 '. T ++ + N + G L+ + N+ L + I+++ P+ KL +L + I++ 

Sbjct: 177 TNLDELSmGNQLKDIGTLASLTNLTDLDLANNQISNIAPLPGLTKlTELKLGANQISNI 236 

Query: 545 SFLDKMPLLEGLDISQNGIKDLSFLTKYKQLSLIAAANNGITSLKPLAELPNLQFLVLSH 604 
L + L L++++N ++D+S ++ K L+ + N 1+ + P++ L LQ L + 
45 Sbjct: 237 XPLAGLTALTNLELNENQLED I S P I SNLKNLTYLTLYFNNI SDI SPVSSLTKLQRLFFYN 296 

Query: 605 NNISDLTPLSNLTKLQELYLDHNNVKNLSALSGKKDLKVLDLSNNK 650 

N +SD++ L+NLT + L HN + +L+ L+ + L L++ + 
Sbjct: 297 NKVSDVSSLANLTNINWLSAGHNQISDLTPLANLTRITQLGLNDQE 342 

50 

A related DNA sequence was identified in S. pyogenes <SEQ ID 4183> which encodes the amino acid 
sequence <SEQ ID 4184>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

55 >>> May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

60 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAA69530 GB:U25448 internalin [Listeria monocytogenes] 
Identities = 88/279 (31%) , Positives = 149/279 (52%) , Gaps = 2/279 (0%) 

65 
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Query: 419 LPNLETLGIGFTPIKDISPVLQFKKLKQLIOTKTGVTDYRFLDNMPQLEGIDISQNNLKD 478 

L + TL IK I + L Q+ + +TD L ++ +L I ++ N + D 

Sbjct: 1 LDXVTTLQADRLGIKSIDGLEYUSfflLTQra^ 60 

5 Query: 479 ISFLSKYKNLTLVAAaDNGIEDIRPLGQLPNLKFLVLSNNKISDLSPLASLHQLQELHID 538 

1+ L+ NLT + +N I DI PL L NL L LS+N ISD+S L+ L LQ+L + 
Sbjct: 61 ITPLA]S^JSNLTGLTLFNNQITDIDPIlKNLT^^^NRLELSSNTISDISALSGLTSLQQLSL- 119 

Query: 539 NNQITDLSPVSHKESLTVVDLSRNADVDLATL-QAPKIiETLMVMiTKySHLDFLKimPNL 597 
10 NQ4-TDL P+++ +L 4-D+S N D4-+ L + LE+L+ 4- ++S + L NL 

Sbjct: 120 GNQVTDLKPLANLTTLERLDISSNKVSDISVIJ^TNLESLIATNNQISDITPLGILTNL 179 

Query: 598 SSLSINRAQLQSLEGIEASSVIVRVEAEGNQIKSLVLKDKQGSLTFLDVTGNQLTSLEGV 657 
LS+N QL+ + + + + + ++ NQI +L LT L + NQ++++ + 

15 Sbjct: 180 DELSLNGNQLKDIGTLASLTNLTDLDLANNQISNLAPLPGLTKLTELKLGANQISNIXPL 239 

Query: 658 NNFTALDILSVSKNQLTNVNLSKPNKTVTNID I SHNNI S 696 

TAL L +++NQL +++ K +T + + NNIS 

Sbjct: 240 AGLTALTNLELNENQLED I S P I SNLKNLTYLTLYFNNIS 278 

20 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 346/753 (45%) , Positives = 472/753 (61%) , Gaps = 63/753 (8%) 

Query: 187 SRLGNQSNSHYRVNSSK IAGLHYPTSNGFLFNGRG-IKGTTPTGILVEHHNH 237 

25 SR G SN + SK +AG+ +PT +GF+ I T GI+V+H H 

Sbjct: 38 SRKGMTSNKIKPIKKSKKTNKTHKGVAGVDFPTDDGFILTKDSKILSKTDQGIWDHDGH 97 

Query: 238 LHFISFADLRKGGW GS IADRYQPQKKAD SKKQS PS SKKPRTENTLPKD I - - KDK 289 

HFI +ADL+ + G+ + ++A S+ S + P DI +D 

30 Sbjct: 98 SHFIFYADLIffiSPFEYLIPEGMLAKPAVRQRAASQGTSKVADPHHHYEFNPADIVAEDA 157 

1 Query: 290 LAYLARE LHLDI SRIRVLKTLNGEIGFEYPHDDHT 324 

LYR ' H + S+ TNGG+PD 

Sbjct: 158 LGYTTOHDDHFHYILKSSLSGQTQAQAKQVATRLPQTSSLVSTATANGIPGLHFPTSDGF 217 

35 

Query: 325 HVIMAKDIDLSKPIPNPHHDDEDH HKGHHHD- - -ESDHKHEEHEHTK 368 

+ ++K HD H H +D +++ E H+ + 

Sbjct: 218 QEWGQGIVGVTKDSILVDHDGHIjHPISFADLRQGGWAHVADQYDPAKKAEKPAETHQTPE 277 

40 Query: 369 SNKLSDEDQKKLIYLAEKLGLNPNQIEVLTSEDGSIIFKYPHDDHSHTIASKDIEIGKPI 428 

++ E Q+KL YLAEKLG++P+ 1+ + ++DG + +YPH DH+H + DIEIGK I 
Sbjct: 278 LSEREKEYQEKIAYIAEKLGIDPSTIKRVETQDGKLGLEYPHHDHAHVLMLSDIEIGKDI 337 

Query: 429 PDGH HDHSHAKDKVGMATLKQIGFDDEIIQDILHA-DAPTPFPSNETNPEKMRQWLA 484 

45 PD H H K KVGM TL+ +GFD+E+I DI+ DAPTPFPSNE +P M++WLA 

Sbjct: 338 PDPHAIEHARELEKHKVGMDTLRALGFDEEVILDIVRTHDAPTPFPSNEKDPNMMKEWLA 397 

Query: 485 TVTKINIGQRTNPFQRFGLSLMPNIEVLGIGFTPINDMTPVLQFKKLKQLWMTNTGITDY 544 
TV K+++G R +P QR GLSL+PN+E LGIGFTPI D++PVLQFKKLKQL MT TG+TDY 
50 Sbjct: 398 TVIKLDLGSRKDPLQRKGLSLLPNLETLGIGFTPIKDISPvLQFKKLKQLLMTKTGVTDY 457 

Query: 545 SFLDKMPLLEGLDISQNGIKDLSFLTKYKQLSLIAAANNGITSLKPLAELPNLQFLVLSH 604 

FLD MP LEG+DISQN +KD+SFL+KYK L+L+AAA+NGI ++PL +LPNL+FLVLS+ 
Sbjct: 458 RFLDNMPQLEGIDISQNNLKDISFLSKYKNLTLVAAADNGIEDIRPLGQLPNLKFLVLSN 517 



55 



Query: 605 NNISDLTPLSNLTKLQELYLDHNIWKNLSALSGKKDLICVLDLSNNKSADLSTLKTTSLET 664 

N ISDL+PL++L +LQEL++D+N + +LS +S K+ L V+DLS N DL+TL+ LET 
Sbjct: 518 NKISDLSPLASLHQLQELHID1OTQITDLSPVSHKESLTVVDLSRNADVDLATLQAPKLET 577 



60 Query: 665 LLIMTOTSIS&SFLKQNPKVSNLTINNAKIMLDGIEESDEI 724 

L++N+T S+L FLK NP +S+L+IN A+L SL+GIE S IV+VEAEGNQIKSLVLK+K 
Sbjct: 578 LMV1TOTKVSHLDFLKNNPNLSSLSINRAQLQSLEGIEASSVIW\7EAEGNQIKSLVLKDK 637 

Query: 725 QGSLKFLNVTNNQLTSLEGVNNYTSLETLSVSKNKLESLDIKTPN^ 784 
65 QGSL FL+VT NQLTSLEGVNN+T+L+ LSVSKN+L ++++ PNKTVTN+D SHKN+ 

Sbjct: 638 QGSLTFLDTOGNQLTSLEGVNNFTALDILSVSKNQLTNVNLSKPNKTVTNIDISHNNISL 697 
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Query: 785 SQLKIOTKNIPEAVAKNFPAVVBGSMVGNGSLAEK^y^SKEDKQVSD-ISPIOTQKNTEKS 843 

+ LKLNE++IPEA+AKNFPAV EGSMVGNG+ EKAftMA+K + + + +H N + 
Sbjct: 698 ADLKIlNEQHIPEAIAKMFPAVYEGS^WGNGTAEEKAA^^TKAKESAQEASESHDYNHNHT 757 

5 Query: 844 AQANADSKKENPKTHDEHHDHEETDHAHVGHHH 876 

+ E+ D H+HE+ + A +H 

Sbjct: 758 YEDEEGHAHEHRDKDDHDHEHEDENEAKDEQNH 790 

SEQ ID 4182 (GBS84) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
10 extract is shown in Figure 16 (lane 9; MW 97.6kDa). 

GBS84-His was purified as shown in Figure 194, lane 7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1369 

15 A DNA sequence (GBSxl454) was identified in S.agalactiae <SEQ ID 4185> which encodes the amino 
acid sequence <SEQ ID 4186>. This protein is predicted to be GTP-binding protein lepa (lepA). Analysis of 
this protein sequence reveals the following: 



20 



25 



Possible site: 30 

>>> Seems to have no N- terminal signal sequence 



----- Final Results 

bacterial cytoplasm Certainty=0 . 1962 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14493 GB:Z99117 GTP-binding protein [Bacillus subtilis] 
Identities = 464/603 (76%) , Positives = 540/603 (88%) 

30 Query: 8 KKQEKIRNFSIIAHIDHGKSTIADR1LEKTETVSSREMQAQLLDSMDLERERGITIKLNA 67 

+RQ +IRNFSIIAHIDHGKSTLADRILEKT ++ REM+ QLLDSMDLERERGITIKLN+ 
Sbjct: 9 ERQSRIRNFSIIAHIDHGKSTLADRILEKTSAITQREMKEQLLDSMDLERERGITIKLNS 68 

Query: 68 IELNYTAKDGETYIFHLIDTPGHVDFTYEVSRSIAACEGAILvVDAAQGIEAQTLANVYL 127 
35 ++L Y AKDGE Y1FHLIDTPGHVDFTYEVSRSIAACEGAILVVDAAQGIEAQTLANVYL 

Sbjct: 69 VQLKYKAKBGEEYIFHLIDTPGHVDFTYEVSRSIAACEGAILVVDAAQGIEAQTLANVYL 128 

Query: 128 ALDNDLEILPVINKIDLPAADPERVRAEVEDVIGLDASEAVLASAKAGIGIEEILEQIVE 187 
ALDNDLEILPVINKIDLP+A+PERVR EVEDVIGLDASEAVLASAKAG1GIEEILEQIVE 
40 Sbjct: 129 ALDNDLEILPVINKIDLPSAEPERVRQEVEDVIGLDASEAVLASAKAGIGIEEILEQIVE 188 

Query: 188 KVPAPTGEVDAPLQALIFDSVYDAYRGVILQVRIVNGMVKPGDKIQMMSNGKTFDVTEVG 247 

KVPAPTG+ +APL+ALIFDS+YDAYRGV+ +R+V G VKPG KI+MM+ GK F+VTEVG 
Sbjct: 189 KVPAPTGDPFAPLKALIFDSLYDAYRGWAYIRVVEGWKPGQKIKMMATGKEFEVTEVG 248 

45 

Query: 248 IFTPKAVGRDFLATGDVGYIAASIKTVADTRVGDTITLANNPAIEPLHGYKQMNPMVFAG 307 

+FTPKA + L GDVG++ ASIK V DTRVGDTIT A NPA E L GY+++NPMV+ G 
Sbjct: 249 VFTPKATPTNELWGDVGFLTASIKNVGDTRVGDTITSAANPAEFALPGYRKLNPMVYCG 308 

50 Query: 308 LYPIESNKYNDLRFALEKLQUTOASLQFEPETSQALGFGFRCGFLGLLHMDVIQERLERE 367 

LYPI++ KYNDLREALEKL+LND+SLQ+E ETSQALGFGFRCGFLG+LHM++IQER+ERE 
Sbjct: 309 LYPIDTAKYNDLREALEKLELNDSSLQYEAETSQALGFGFRCGFLGMLHMEIIQERIERE 368 

Query: 368 FNIDLIMTAPSvVYHVNTTDGEMLEVSNPSEFPDPTRVDSIEEPYVKAQIMVPQEFVGAV 427 
55 FNIDLI TAPSV+Y V TDGE + V NPS PDP +++ +EEPYVKA +MVP ++VGAV 

Sbjct: 369 FNIDLITTAPSVIYDVYMTDGEKVVVDNPSNMPDPQKIERVEEPYVKATMMVPNDYVGAV 428 



Query: 428 MEIAQRKRGDFVTMDYIDDNRVNVIYQIPLAEIVFDFFDKI.KSSTRGYASFDYEISEYRR 487 
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MEL Q KRG+F+ M Y+D NRV++IY +PLAEIV++FFD+LKSST+GYASFDYE+ Y+ 
Sbjct: 429 MELCQGKRGNFIDMQYLDANRVSIIYDMPIAEIVYEFFDQLKSSTKGYASFDYELIGYKP 488 

Query: 488 SQI^KMDILLNGDKVDALSFIVHKEFAYERGKLIVDKLKKIIPRQQFEVPIQAAIGQKIV 547 

S+L KMDI+IMG+K+DALSFIVH+++AYERGK+IV+KLK++IPRQQFEVP+QAAIGQKIV 
Sbjct: 489 SKLVKMDIMLMGEKIDALSFIVHRDYAYERGKVIVEKLKELIPRQQFEVPVQAAIGQKIV 548 

Query: 548 ARSDIKALRKNVIAKCYGGDVSRKRKLLEKQKAGKKRMKAIGSVEVPQEAFLSVLSMDDD 607 

ARS IKA+RKNVLAKCYGGD+SRKRKLLEKQK GK+RMK +GSVEVPQEAF++VL MDD 
Sbjct: 549 ARSTIKAMRKimAKCYGGDISRKRKLLEKQKEGKRRMKQVGSVEVPQEAFMAVLKMDDS 608 

Query: 608 DKK 610 
KK 

Sbjct: 609 PKK 611 

A related GBS sequence was identified <SEQ ID 10775> which encodes the amino acid sequence <SEQ ID 
10776>. A further related GBS nucleic acid sequence <SEQ ID 10955> which encodes amino acid 
sequence <SEQ ID 10956> was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4187> which encodes the amino acid 
sequence <SEQ ID 4188>. Analysis of this protein sequence reveals the following: 
Possible site: 30 
>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .1829 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB14493 GB:Z99117 GTP-binding protein [Bacillus subtilis] 
Identities = 463/603 (76%) , Positives = 542/603 (89%) 

Query: 8 KRQEKIRNFSIIAHIDHGKSTLADRILEKTETVSSREMQAQLLDSMDLERERGITIKLNA 67 

+RQ + IRNFS I IAHIDHGKSTLADRILEKT ++ REM+ QLLDSMDLERERGITIKLN+ 
Sbjct,: 9 ERQSRIRNFSIIAHIDHGKSTLADRILEKTSAITQREMKEQLLDSMDLERERGITIKLNS 68 

Query: 68 IEIOTTAKDGETYIFHLIDTPGHVDFTYEVSRSI^CEGAILVVDAAQGIEAQTLiANVYL 127 

++L Y AKDGE YIFHLIDTPGHVDFTYEVSRSIAACEGAILVVDAAQGIEAQTLANVYL 
Sbjct: 69 VQLKYKAKDGEEYIFHLIDTPGHVDFTYEVSRSLAACEGAILVATOAACGIEAQTIjANVYL 128 

' Query: 128 ALDNDLEILPVINKIDLPARDPERVRHEVEDVIGLDASEAVLASAKAGIGIEEILEQIVE 187 
ALDNDLEILPVINKIDLP+A+PERVR EVEDVIGLDASEAVLASAKAGIGIEEILEQIVE 
Sbjct: 129 ALDNDLEILPVINKIDLPSAEPERVRQEVEDVIGLDASEAVLASAKAGIGIEEILEQIVE 188 

Query: 188 KVPAPTGDVDAPLQALIFDSvYDAYRGVILQvRIVNGIVKPGDKIQMMSNGKTFDVTEVG 247 

KVPAPTGD +APL+ALI FDS + YDAYRGV+ +R+V G VKPG KI+MM+ GK F+VTEVG 
Sbjct: 189 KVPAPTGDPEAPLKALIFDSLYDAYRGVVAYIRvvEGTVKPGQKIKMI^TGKEFEvTEVG 248 

Query: 248 I FTPKAVGRDFLATGD VGYVAAS IKOTADTRVGDTVTLANNPAKEALHGYKQMNPMVFAG 307 

+FTPKA + L GDVG++ ASIK V DTRVGDT+T A NPA+EAL GY+++NPMV+ G 
Sbjct: 249 VFTPKATPTNELOTGDVGFLTASIKNVGDTRVGOTITSAANPAEEALPGYRKLNPMVYCG 308 

Query: 308 IYPIESNKYNDLREALEKLQEITOASLQFEPETSQALGFGFRCGFLGLLHMDVIQERLERE 367 

+YPI++ KYNDLREALEKL+LND+SLQ+E ETSQALGFGFRCGFLG+LHM++IQER+ERE 
Sbjct: 309 LYPIDTAKYNDLREALEKLELNDSSLQYEAETSQALGFGFRCGFLGMLHMEIIQERIERE 368 

Query: 368 FNIDLIMTAPSVVYHVHTTDEDMIEVSNPSEFPDPTRVAFIEEPYVKAQIMVPQEFVGAV 427 
FNIDLI TAPSV+Y V+ TD + + V NPS PDP ++ +EEPYVKA +MVP ++VGAV 

Sbjct: 369 FNIDLITTAPSVIYDvYMTDGEKvvvDNPSNMPDPQKIERWEPYVKATNMVPNDYVGAV 428 

Query: 428 ^LSQRKRGDFVTMDYIDDNRVNVIYQIPLAEIVFDFFDKLKSSTRGYASFDYDMSEYRR 487 
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MEL Q KRG+F+ M Y+D NRV++IY +PLAEIV++FFD+LKSST+GYASFDY++ Y+ 
Sbjct: 429 MELCQGKRGNFIDMQYLDANRVSIIYDMPIAEIVYEFFDQLKSSTKGYASFDYELIGYKP 488 

Query: 488 SQLVKMDILIiNGDKVDALSFIVHKEFAYERGKIIVEKLKKIIPRQQFEVPIQAAIGQKIV 547 
5 S+LVKMDI+LNG+K+DALSFIVH+++AYERGK+IVEKLK++IPRQQFEVP+QAAIGQKIV 

Sbjct: 489 SKLVKMDIMLNGEKIDALSFIVHRDYAYERGKVIVEKLKELIPRQQFEVPVQAAIGQKIV 548 

Query: 548 ARSDIKALRKNVIAKCYGGDVSRKRKLLEKQKAGKKRMKAIGSVEVPQEAFLSVLSMDDD 607 
ARS IKA+RKNVLAKCYGGD+SRKRKLLEKQK GK+RMK +GSVEVPQEAF++VL MDD 
10 Sbjct: 549 ARSTIKAMRKNVLAKCYGGDISRKRKLLEKQKEGKRRMKQVGSVEVPQEAFMAVLKMDDS 608 

Query: 608 TICK 610 
KK 

Sbjct: 609 PKK 611 

15 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 587/610 (96%) , Positives = 601/610 (98%) 

Query: 1 MNIEDLKKRQEKIRNFSIIAHIDHGKSTIiADRILEKTETVSSREMQAQLLDSMDLERERG 60 
20 MN +DLKKRQEKIRNFSIIAHIDHGKSTLADRILEKTETVSSREMQAQLLDSMDLERERG 

Sbjct: 1 mSQDLKKRQEKIRNFSIIAHIDHGKSTLADRILEKTETVSSREMQAQLLDSMDLERERG 60 

Query: 61 ITIKLNAIELNYTAKDGETYIFHLIDTPCSHVDFTYEVSRSLAACEGAILVvDAAQGIEAQ 120 
ITIKLNAIELMYTAKDGETYIFHLIDTPGHVDFTYEVSRSLAACEGAILVVDAAQGIEAQ 
25 Sbjct: 61 1TIKLNAIELNYTAKDGETYIFHLIDTPGHVDFTYEVSRSIAACEGAILVVDAAQGIEAQ 120 

Query: 121 TLANWLALDNDLEILPVINKIDLPAADPERVRAEVEDVIGLDASEAVLASAKAGIGIEE 180 

TLANVYLALDNDLEILPVINKIDLPAADPERVR EVEDVIGLDASEAVLASAKAGIGIEE 
Sbjct: 121 TIANVYLALDNDLEILPVINKIDLPAADPERVRHEVEDVIGLDASEAVLASAKAGIGIEE 180 

30 

Query: 181 ILEQIvEKVPAPTGEVTJAPLQALIFDSVYDAYRGVILQTOIVNGMVKPGDKIQMMSNGKT 240 

ILEQIWKVPAPTG+VDAPLOALIFDSVYDAYRGVILQVRIVNG+VKPGDKIQMMSNGKT 
Sbjct: 181 I LEQI VEKVPAPTGDVDAPLQALI FDSVYDAYRGVILQVRIVNGIVKPGDKI QMMSNGKT 240 

35 Query: 241 FDVTEVGI FTPKAVGRDFLATGDVGYIAAS I KTVADTRVGDTITLANNPAIEPLHGYKQM 300 

FDVTEVGI FTPKAVGRDFLATGDVGY+AAS I KTVADTRVGDT+TLANNPA E LHGYKQM 
Sbjct: 241 FDVTEVGI FTPKAVGRDFLATGDVGYVAAS I KTVADTRVGDTVTLANNPAKEALHGYKQM 300 

Query: 301 NPMVFAGLYPIESNKYNDLREALEKLQLNDASLQFEPETSQALGFGFRCGFLGLLHMDVI 360 
40 NPWFAG+YPIESNKYM3LREALEKLQLNDASLQFEPETSQALGFGFRCGFLGLLHMDVI 

Sbjct: 301 NPMVFAGIYPIESNKYNDLREALEKLQLNDASLQFEPETSQALGFGFRCGFLGLLHMDVI 360 

Query: 361 QERLEREFNIDLIMTAPSWYHVNTTDGEMLEVSNPSEFPDPTRVDSIEEPYVKAQIMVP 420 
QERLEREFNIDLIMTAPSWYHV+TTD +M+EVSNPSEFPDPTRV IEEPYVKAQIMVP 
45 Sbjct: 361 QERLEREFNIDLIMTAPSWYHVHTTDEDMIEVSNPSEFPDPTRVAFIEEPYVKAQIMVP 420 

Query: 421 QEFVGAVMELAQRKRGDFVTMDYIDDITOVNVIYQIPLAEIVFDFFDICLKSSTRGYASFDY 480 

QEFVGAVMEL+QRKRGDFVTMDYIDDNRVNVIYQIPLAEIVFDFFDICLKSSTRGYASFDY 
Sbjct: 421 QEFVGAVMELSQRKRGDFVTMDYIDDNRVNVIYQIPLAEIVFDFFDKLKSSTRGYASFDY 480 

50 

Query: 481 EISEYRRSQnXKMDILLNGDKVDALSFIVHKEFAYERGKLIVDKLKKIIPRQQFEVPIQA 540 

++SEYRRSQL KMDILLNGDKVDALSFIVHKEFAYERGK+IV+KLKKI IPRQQFEVPIQA 
Sbjct: 481 DMSEYRRSQLVKMDILIjNGDKVDALSFIVHKEFAYERGKIIVEKLKKIIPRQQFEVPIQA 540 

55 Query: 541 AIGQKIVARSDIKALRKNVLAKCYGGDVSRKRKLLEKQKAGKKRMPCAIGSVEVPQEAFLS 600 

AIGQKIVARSDIKALRKNVLAKCYGGDVSRKRKLLEKQKRGKKRMKAIGSVEVPQEAFLS 
Sbjct: 541 AIGQKIVARSDIKALRKNVLAKCYGGDVSRKRKLLEKQKAGKJCRMKAIGSVEVPQEAFLS 600 

Query: 601 VLSMDDDDKK 610 
60 VLSMDDD KK 

Sbjct: 601 VLSMDDDTKK 610 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1370 

A DNA sequence (GBSxl455) was identified in S.agalactiae <SEQ ID 4189> which encodes the amino 
acid sequence <SEQ ID 4190>. This protein is predicted to be awd gene product (ndk). Analysis of this 
protein sequence reveals the following: 

5 Possible site: 42 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2 097 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF57188 GB:AE003779 awd gene product [Drosophila melanogaster] 
15 Identities = 73/136 (53%) , Positives = 100/136 (72%) , Gaps = 5/136 (3%) 

Query: 2 EQTFFMIKPDGVKRGFIGEVISRIERRGFSIDRLEVRYADADILKRHYAELTDRPFFPTL 61 

E+TF M+KPDGV+RG +G++I R E++GF + L+ +A ++L++HYA+L+ RPFFP L 
Sbjct: 25 ERTFIMVKPDGVQRGLVGKIIERFEQKGFKLVALKFTWASKELLEKHYADLSARPFFPGL 84 

20 

Query: 62 VDYMTSGPVIIGVISGEEVISTWRTMMGSTNPKDALPGTIRGDFAQAPSPNQATCNIVHG 121 

V+YM SGPV+ V G V+ T R M+G+TNP D+LPGTIRGDF Q NI+HG 

Sbjct: 85 VNYMNSGPWPMVWEGLNWKTGRQMLGATNPADSLPGTIRGDFC IQVGRNIIHG 139 

25 Query: 122 SDSPESATREIAIWFN 137 

SD+ ESA +EIA+WFN 
Sbjct: 140 SDAVESAEKEIALWFN 155 

1 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4191> which encodes the amino acid 
30 sequence <SEQ ID 4192>. Analysis of this protein sequence reveals the following: 
Possible site: 22 

>>> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0 .2913 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

40 Identities = 30/48 (62%) , Positives = 35/48 (72%) 

Query: 87 MMGSTNPKDALPGTIRGDFAQAPSPNQATCNIVHGSDSPESATREIAI 134 

MM TNPKDAL GTIR +FAQAP + N+VHGS S +SA REIA+ 
Sbjct: 1 MMRVTNPKDALCGTIRENFAQAPGDDGGIFNMVHGSHSRDSARREIAL 48 

45 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1371 

A DNA sequence (GBSxl456) was identified in S.agalactiae <SEQ ID 4193> which encodes the amino 
50 acid sequence <SEQ ID 4194>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

»> Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0. 2734 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 



